Summary
Groq provides a high-speed AI inference engine, the LPU™ Inference Engine, available through cloud and on-premise solutions. They offer API access for developers to integrate various openly-available AI models, including large language models, text-to-speech, and automatic speech recognition models. Groq also provides enterprise solutions for large-scale deployments and custom model requests.
Features4/13
See allMust Have
4 of 5
Conversational AI
API Access
Fine-Tuning & Custom Models
Enterprise Solutions
Safety & Alignment Framework
Other
0 of 8
Image Generation
Code Generation
Multimodal AI
Research & Publications
Security & Red Teaming
Synthetic Media Provenance
Threat Intelligence Reporting
Global Affairs & Policy
PricingUsage-based
See allLlama 4 Scout (17Bx16E)
- 460 Tokens per Second
Llama 4 Scout (17Bx16E)
- 460 Tokens per Second
Llama 4 Maverick (17Bx128E)
- 581 Tokens per Second
Llama 4 Maverick (17Bx128E)
- 581 Tokens per Second
Llama Guard 4 12B 128k
- 325 Tokens per Second
Llama Guard 4 12B 128k
- 325 Tokens per Second
DeepSeek R1 Distill Llama 70B
- 275 Tokens per Second
DeepSeek R1 Distill Llama 70B
- 275 Tokens per Second
Qwen3 32B 131k
- 491 Tokens per Second
Qwen3 32B 131k
- 491 Tokens per Second
Qwen QwQ 32B (Preview) 128k
- 400 Tokens per Second
Qwen QwQ 32B (Preview) 128k
- 400 Tokens per Second
Mistral Saba 24B
- 330 Tokens per Second
Mistral Saba 24B
- 330 Tokens per Second
Llama 3.3 70B Versatile 128k
- 275 Tokens per Second
Llama 3.3 70B Versatile 128k
- 275 Tokens per Second
Llama 3.1 8B Instant 128k
- 750 Tokens per Second
Llama 3.1 8B Instant 128k
- 750 Tokens per Second
Llama 3 70B 8k
- 330 Tokens per Second
Llama 3 70B 8k
- 330 Tokens per Second
Llama 3 8B 8k
- 1250 Tokens per Second
Llama 3 8B 8k
- 1250 Tokens per Second
Gemma 2 9B 8k
- 500 Tokens per Second
Gemma 2 9B 8k
- 500 Tokens per Second
Llama Guard 3 8B 8k
- 765 Tokens per Second
Llama Guard 3 8B 8k
- 765 Tokens per Second
PlayAI Dialog v1.0
- 140 Characters /s
Whisper V3 Large
- 189x Speed Factor
Whisper Large v3 Turbo
- 216x Speed Factor
Distil-Whisper
- 250x Speed Factor
Rationale
Groq offers an AI inference engine with API access for developers, supporting various large language models for conversational AI. They explicitly mention 'Enterprise Access' for custom and large-scale needs, and their pricing page states 'Other models are available for specific customer requests including fine tuned models,' indicating support for custom models. While they focus on inference speed, the core functionalities align with the OpenAI Platform's offerings for developers and enterprises.