Cerebras inference support
Hi,
I would like to know if someone worked on making humanify support Cerebras inference, as it is compatible with OpenAI and can be a better alternative in terms of speed and cost?
https://inference-docs.cerebras.ai/resources/openai
as it is compatible with OpenAI
@neoOpus Have you tried using the humanify openai --baseURL param in the way they suggest?
- https://inference-docs.cerebras.ai/resources/openai#configuring-openai-to-use-cerebras-api
-
Configuring OpenAI to Use Cerebras API
-
https://github.com/jehna/humanify/blob/7beba2d32433e58bb77d0e1b0eda01c470fec3e2/src/commands/openai.ts#L20-L24
I'd be interested to hear if you manage to get it to work, and also your feedback on the speed differences, how effective the different models are when used with humanify, etc.
It seems it's also usable via OpenRouter:
- https://github.com/jehna/humanify/issues/416
- https://inference-docs.cerebras.ai/resources/openrouter-cerebras
- https://openrouter.ai/provider/cerebras
These seem to be the models currently available:
- https://inference-docs.cerebras.ai/introduction
-
The Cerebras Inference API currently provides access to the following models:
Model Name Model ID Parameters Speed (tokens/s) Llama 4 Scout llama-4-scout-17b-16e-instruct109 billion ~2600 tokens/s Llama 3.1 8B llama3.1-8b8 billion ~2200 tokens/s Llama 3.3 70B llama-3.3-70b70 billion ~2100 tokens/s Qwen 3 32B* qwen-3-32b32 billion ~2100 tokens/s DeepSeek R1 Distill Llama 70B* deepseek-r1-distill-llama-70b70 billion ~1700 tokens/s
-
The pricing:
- https://inference-docs.cerebras.ai/support/pricing
-
Pricing
-
Our free tier supports a context length of 8,192 tokens. For all supported models, we also offer context lengths up to 128K upon request.
- https://inference-docs.cerebras.ai/support/pricing#exploration-tier-pricing
-
Model Speed Input Output Llama 4 Scout ~2600 tokens/s $0.65/M tokens $0.85/M tokens Llama 3.1 8B ~2200 tokens/s $0.10/M tokens $0.10/M tokens Llama 3.3 70B ~2100 tokens/s $0.85/M tokens $1.20/M tokens Qwen 3 32B ~2100 tokens/s $0.40/M tokens $0.80/M tokens Deepseek R1 Distill Llama 70B ~1700 tokens/s $2.20/M tokens $2.50/M tokens
-
-
And the rate limits:
- https://inference-docs.cerebras.ai/support/rate-limits
-
Rate Limits
-
And further docs about tool use/function calling:
- https://inference-docs.cerebras.ai/capabilities/tool-use
-
Tool Use
-
- https://inference-docs.cerebras.ai/agent-bootcamp/section-2
-
Tool Use and Function Calling
-
See Also:
- https://github.com/jehna/humanify/issues/400
- https://github.com/jehna/humanify/issues/84