humanify icon indicating copy to clipboard operation
humanify copied to clipboard

Cerebras inference support

Open neoOpus opened this issue 6 months ago • 1 comments

Hi,

I would like to know if someone worked on making humanify support Cerebras inference, as it is compatible with OpenAI and can be a better alternative in terms of speed and cost?

https://inference-docs.cerebras.ai/resources/openai

neoOpus avatar Jun 29 '25 13:06 neoOpus

as it is compatible with OpenAI

@neoOpus Have you tried using the humanify openai --baseURL param in the way they suggest?

  • https://inference-docs.cerebras.ai/resources/openai#configuring-openai-to-use-cerebras-api
    • Configuring OpenAI to Use Cerebras API

https://github.com/jehna/humanify/blob/7beba2d32433e58bb77d0e1b0eda01c470fec3e2/src/commands/openai.ts#L20-L24

I'd be interested to hear if you manage to get it to work, and also your feedback on the speed differences, how effective the different models are when used with humanify, etc.


It seems it's also usable via OpenRouter:

  • https://github.com/jehna/humanify/issues/416
    • https://inference-docs.cerebras.ai/resources/openrouter-cerebras
    • https://openrouter.ai/provider/cerebras

These seem to be the models currently available:

  • https://inference-docs.cerebras.ai/introduction
    • The Cerebras Inference API currently provides access to the following models:

      Model Name Model ID Parameters Speed (tokens/s)
      Llama 4 Scout llama-4-scout-17b-16e-instruct 109 billion ~2600 tokens/s
      Llama 3.1 8B llama3.1-8b 8 billion ~2200 tokens/s
      Llama 3.3 70B llama-3.3-70b 70 billion ~2100 tokens/s
      Qwen 3 32B* qwen-3-32b 32 billion ~2100 tokens/s
      DeepSeek R1 Distill Llama 70B* deepseek-r1-distill-llama-70b 70 billion ~1700 tokens/s

The pricing:

  • https://inference-docs.cerebras.ai/support/pricing
    • Pricing

    • Our free tier supports a context length of 8,192 tokens. For all supported models, we also offer context lengths up to 128K upon request.

    • https://inference-docs.cerebras.ai/support/pricing#exploration-tier-pricing
      • Model Speed Input Output
        Llama 4 Scout ~2600 tokens/s $0.65/M tokens $0.85/M tokens
        Llama 3.1 8B ~2200 tokens/s $0.10/M tokens $0.10/M tokens
        Llama 3.3 70B ~2100 tokens/s $0.85/M tokens $1.20/M tokens
        Qwen 3 32B ~2100 tokens/s $0.40/M tokens $0.80/M tokens
        Deepseek R1 Distill Llama 70B ~1700 tokens/s $2.20/M tokens $2.50/M tokens

And the rate limits:

  • https://inference-docs.cerebras.ai/support/rate-limits
    • Rate Limits

And further docs about tool use/function calling:

  • https://inference-docs.cerebras.ai/capabilities/tool-use
    • Tool Use

  • https://inference-docs.cerebras.ai/agent-bootcamp/section-2
    • Tool Use and Function Calling


See Also:

  • https://github.com/jehna/humanify/issues/400
  • https://github.com/jehna/humanify/issues/84

0xdevalias avatar Jun 30 '25 03:06 0xdevalias