free-llm-api-resources
free-llm-api-resources copied to clipboard
A list of free LLM inference resources accessible via API.
Free LLM API resources
This lists various services that provide free access or credits towards API-based LLM usage.
[!NOTE]
Please don't abuse these services, else we might lose them.
[!WARNING]
This list explicitly excludes any services that are not legitimate (eg reverse engineers an existing chatbot)
Free Providers
| Provider | Provider Limits/Notes | Model Name | Model Limits |
|---|---|---|---|
| Groq | Distil Whisper Large v3 | 7200 audio-seconds/minute 2000 requests/day | |
| Gemma 2 9B Instruct | 14400 requests/day 15000 tokens/minute | ||
| Gemma 7B Instruct | 14400 requests/day 15000 tokens/minute | ||
| LLaVA 1.5 7B | 14400 requests/day 30000 tokens/minute | ||
| Llama 3 70B | 14400 requests/day 6000 tokens/minute | ||
| Llama 3 70B - Groq Tool Use Preview | 14400 requests/day 15000 tokens/minute | ||
| Llama 3 8B | 14400 requests/day 30000 tokens/minute | ||
| Llama 3 8B - Groq Tool Use Preview | 14400 requests/day 15000 tokens/minute | ||
| Llama 3.1 70B | 14400 requests/day 20000 tokens/minute | ||
| Llama 3.1 8B | 14400 requests/day 20000 tokens/minute | ||
| Llama 3.2 11B (Text Only) | 7000 requests/day 7000 tokens/minute | ||
| Llama 3.2 11B Vision | 7000 requests/day 7000 tokens/minute | ||
| Llama 3.2 1B | 7000 requests/day 7000 tokens/minute | ||
| Llama 3.2 3B | 7000 requests/day 7000 tokens/minute | ||
| Llama 3.2 90B (Text Only) | 7000 requests/day 7000 tokens/minute | ||
| Llama Guard 3 8B | 14400 requests/day 15000 tokens/minute | ||
| Mixtral 8x7B | 14400 requests/day 5000 tokens/minute | ||
| Whisper Large v3 | 7200 audio-seconds/minute 2000 requests/day | ||
| OpenRouter | Gemma 2 9B Instruct | 20 requests/minute 200 requests/day | |
| Hermes 3 Llama 3.1 405B | 20 requests/minute 200 requests/day | ||
| Liquid LFM 40B | 20 requests/minute 200 requests/day | ||
| Llama 3 8B Instruct | 20 requests/minute 200 requests/day | ||
| Llama 3.1 405B Instruct | 20 requests/minute 200 requests/day | ||
| Llama 3.1 70B Instruct | 20 requests/minute 200 requests/day | ||
| Llama 3.1 8B Instruct | 20 requests/minute 200 requests/day | ||
| Llama 3.2 11B Vision Instruct | 20 requests/minute 200 requests/day | ||
| Llama 3.2 1B Instruct | 20 requests/minute 200 requests/day | ||
| Llama 3.2 3B Instruct | 20 requests/minute 200 requests/day | ||
| Mistral 7B Instruct | 20 requests/minute 200 requests/day | ||
| Mythomist 7B | 20 requests/minute 200 requests/day | ||
| OpenChat 7B | 20 requests/minute 200 requests/day | ||
| Phi-3 Medium 128k Instruct | 20 requests/minute 200 requests/day | ||
| Phi-3 Mini 128k Instruct | 20 requests/minute 200 requests/day | ||
| Qwen 2 7B Instruct | 20 requests/minute 200 requests/day | ||
| Toppy M 7B | 20 requests/minute 200 requests/day | ||
| Zephyr 7B Beta | 20 requests/minute 200 requests/day | ||
| Google AI Studio | Data is used for training (when used outside of the UK/CH/EEA/EU). | Gemini 1.5 Flash | 1000000 tokens/minute 1500 requests/day 15 requests/minute |
| Gemini 1.5 Flash (Experimental) | 1000000 tokens/minute 1500 requests/day 5 requests/minute |
||
| Gemini 1.5 Flash-8B | 1000000 tokens/minute 1500 requests/day 15 requests/minute |
||
| Gemini 1.5 Flash-8B (Experimental) | 1000000 tokens/minute 1500 requests/day 15 requests/minute |
||
| Gemini 1.5 Pro | 32000 tokens/minute 50 requests/day 2 requests/minute |
||
| Gemini 1.5 Pro (Experimental) | 1000000 tokens/minute 50 requests/day 2 requests/minute |
||
| Gemini 1.0 Pro | 32000 tokens/minute 1500 requests/day 15 requests/minute |
||
| text-embedding-004 | 150 batch requests/minute 1500 requests/minute 100 content/batch |
||
| embedding-001 | |||
| Lambda Labs (Free Preview) | Free for a limited time | Nous Hermes 3 Llama 3.1 405B (FP8) | |
| Liquid LFM 40B | |||
| Mistral (Le Platforme) | Free tier (Experiment plan) requires opting into data training. | Open and Proprietary Mistral models | 1 request/second 500,000 tokens/minute 1,000,000,000 tokens/month |
| Mistral (Codestral) | Currently free to use, monthly subscription based, requires phone number verification. | Codestral | 30 requests/minute 2000 requests/day |
| HuggingFace Serverless Inference | Limited to models smaller than 10GB. Some popular models are supported even if they exceed 10GB. |
Various open models | 50 requests/hour (with an account) |
| SambaNova Cloud | Llama 3.1 405B | 10 requests/minute | |
| Llama 3.1 70B | 20 requests/minute | ||
| Llama 3.1 8B | 30 requests/minute | ||
| Llama 3.2 3B | 30 requests/minute | ||
| Llama 3.2 1B | 30 requests/minute | ||
| Cerebras | Waitlist Free tier restricted to 8K context |
Llama 3.1 8B | 30 requests/minute, 60000 tokens/minute 900 requests/hour, 1000000 tokens/hour 14400 requests/day, 1000000 tokens/day |
| Llama 3.1 70B | 30 requests/minute, 60000 tokens/minute 900 requests/hour, 1000000 tokens/hour 14400 requests/day, 1000000 tokens/day |
||
| GitHub Models | Waitlist Rate limits dependent on Copilot subscription tier | AI21-Jamba-Instruct | |
| Cohere Command R | |||
| Cohere Command R+ | |||
| Cohere Embed v3 English | |||
| Cohere Embed v3 Multilingual | |||
| Meta-Llama-3-70B-Instruct | |||
| Meta-Llama-3-8B-Instruct | |||
| Meta-Llama-3.1-405B-Instruct | |||
| Meta-Llama-3.1-70B-Instruct | |||
| Meta-Llama-3.1-8B-Instruct | |||
| Mistral Large | |||
| Mistral Large (2407) | |||
| Mistral Nemo | |||
| Mistral Small | |||
| OpenAI GPT-4o | |||
| OpenAI GPT-4o mini | |||
| OpenAI Text Embedding 3 (large) | |||
| OpenAI Text Embedding 3 (small) | |||
| Phi-3-medium instruct (128k) | |||
| Phi-3-medium instruct (4k) | |||
| Phi-3-mini instruct (128k) | |||
| Phi-3-mini instruct (4k) | |||
| Phi-3-small instruct (128k) | |||
| Phi-3-small instruct (8k) | |||
| Phi-3.5-mini instruct (128k) | |||
| OVH AI Endpoints (Free Alpha) | Token expires every 2 weeks. | CodeLlama 13B Instruct | 12 requests/minute |
| Llama 2 13B Chat | 12 requests/minute | ||
| Llama 3 70B Instruct | 12 requests/minute | ||
| Llama 3 8B Instruct | 12 requests/minute | ||
| Llama 3.1 70B Instruct | 12 requests/minute | ||
| Mathstral 7B v0.1 | 12 requests/minute | ||
| Mistral 7B Instruct | 12 requests/minute | ||
| Mixtral 8x22B Instruct | 12 requests/minute | ||
| Mixtral 8x7B Instruct | 12 requests/minute | ||
| Cloudflare Workers AI | 10000 tokens/day | Deepseek Coder 6.7B Base (AWQ) | |
| Deepseek Coder 6.7B Instruct (AWQ) | |||
| Deepseek Math 7B Instruct | |||
| Discolm German 7B v1 (AWQ) | |||
| Falcom 7B Instruct | |||
| Gemma 2B Instruct (LoRA) | |||
| Gemma 7B Instruct | |||
| Gemma 7B Instruct (LoRA) | |||
| Hermes 2 Pro Mistral 7B | |||
| Llama 2 13B Chat (AWQ) | |||
| Llama 2 7B Chat (FP16) | |||
| Llama 2 7B Chat (INT8) | |||
| Llama 2 7B Chat (LoRA) | |||
| Llama 3 8B Instruct | |||
| Llama 3 8B Instruct | |||
| Llama 3 8B Instruct (AWQ) | |||
| Llama 3.1 8B Instruct | |||
| Llama 3.1 8B Instruct (AWQ) | |||
| Llama 3.1 8B Instruct (FP8) | |||
| Llama 3.2 11B Vision Instruct | |||
| Llama 3.2 1B Instruct | |||
| Llama 3.2 3B Instruct | |||
| LlamaGuard 7B (AWQ) | |||
| Mistral 7B Instruct v0.1 | |||
| Mistral 7B Instruct v0.1 (AWQ) | |||
| Mistral 7B Instruct v0.2 | |||
| Mistral 7B Instruct v0.2 (LoRA) | |||
| Neural Chat 7B v3.1 (AWQ) | |||
| OpenChat 3.5 0106 | |||
| OpenHermes 2.5 Mistral 7B (AWQ) | |||
| Phi-2 | |||
| Qwen 1.5 0.5B Chat | |||
| Qwen 1.5 1.8B Chat | |||
| Qwen 1.5 14B Chat (AWQ) | |||
| Qwen 1.5 7B Chat (AWQ) | |||
| SQLCoder 7B 2 | |||
| Starling LM 7B Beta | |||
| TinyLlama 1.1B Chat v1.0 | |||
| Una Cybertron 7B v2 (BF16) | |||
| Zephyr 7B Beta (AWQ) | |||
| Together | Llama 3.2 11B Vision Instruct | Free for 2024 | |
| Cohere | 20 requests/min 1000 requests/month |
Command-R | Shared Limit |
| Command-R+ | |||
| Google Cloud Vertex AI | Very stringent payment verification for Google Cloud. | Llama 3.1 405B Instruct | Llama 3.1 API Service free during preview. 60 requests/minute |
| Llama 3.1 70B Instruct | Llama 3.1 API Service free during preview. 60 requests/minute |
||
| Llama 3.1 8B Instruct | Llama 3.1 API Service free during preview. 60 requests/minute |
||
| Llama 3.2 90B Vision Instruct | Llama 3.2 API Service free during preview. 30 requests/minute |
||
| Gemini Flash Experimental | Experimental Gemini model. 10 requests/minute |
||
| Gemini Pro Experimental | |||
| glhf.chat (Free Beta) | Email for API access | Any model on Hugging Face runnable on vLLM and fits on a A100 node (~640GB VRAM), including Llama 3.1 405B at FP8 |
Providers with trial credits
| Provider | Credits | Requirements | Models |
|---|---|---|---|
| Together | $5 | Various open models | |
| Fireworks | $1 | Various open models | |
| Unify | $10 (+$40 for getting into contact) | Routes to other providers, various open models and proprietary models (OpenAI, Gemini, Anthropic, Mistral, Perplexity, etc) | |
| DeepInfra | $1.80 | Various open models | |
| NVIDIA NIM | 1000 API calls | Various open models | |
| AI21 | $10 for 3 months | Jamba/Jurrasic-2 | |
| NLP Cloud | $15 | Phone number verification | Various open models |
| Upstage | $10 for 3 months | Solar Pro/Mini | |
| Hyperbolic | $10 | DeepSeek V2.5 | |
| Hermes 3 Llama 3.1 70B | |||
| Llama 3 70B Instruct | |||
| Llama 3.1 405B Base | |||
| Llama 3.1 405B Base (FP8) | |||
| Llama 3.1 405B Instruct | |||
| Llama 3.1 70B Instruct | |||
| Llama 3.1 8B Instruct | |||
| Llama 3.2 3B Instruct | |||
| Llama 3.2 90B Vision | |||
| Llama 3.2 90B Vision Instruct | |||
| Pixtral 12B (2409) | |||
| Qwen2-VL 72B Instruct | |||
| Qwen2-VL 7B Instruct | |||
| Qwen2.5 72B Instruct |