litellm icon indicating copy to clipboard operation
litellm copied to clipboard

[Feature]: Proxy Config: Cost specification per 1M tokens

Open freinold opened this issue 2 months ago • 0 comments

The Feature

The proxy configuration provides additional keys for each model:

  • input_cost_per_1m_token
  • output_cost_per_1m_token

These are then divided by 1e6 internally to compute the values of the existing variables input_cost_per_token / output_cost_per_token

Motivation, pitch

Sitation

As a proxy admin maintaining many models from Azure OpenAI with custom billing, we need to specify the input and output cost for each model. The Azure Cost calculator provides these per 1K / 1M Tokens, which are generally a way better scale to think about when running many LLM based workloads where each uses a huge number of tokens.

Problem

When specifying the cost per Token in LiteLLM, i currently need to double check each value because they are so small (and my YAML LSP automatically sets them to value like 1.2e-7 or 2.62e-6). In this format, it is pretty hard to spot errors in the cost config, which would be way easier in a better format like input_cost_per_1m_token: 0.12 or output_cost_per_1m_token: 26.2

Comparison

Current config

  proxy_config:
    model_list:
    - model_name: gpt-4o-mini
      litellm_params:
        model: azure/gpt-4o-mini
        api_key: os.environ/AZURE_API_KEY
        api_base: os.environ/AZURE_OPENAI_BASE_URL
        api_version: 2024-08-01-preview
        input_cost_per_token: 1.6e-7 # alternatively 0.00000016
        output_cost_per_token: 6.3e-7 # alternatively 0.00000063
      model_info:
        version: 2024-07-18
        rate_limit: 2M TPM

Proposed config

  proxy_config:
    model_list:
    - model_name: gpt-4o-mini
      litellm_params:
        model: azure/gpt-4o-mini
        api_key: os.environ/AZURE_API_KEY
        api_base: os.environ/AZURE_OPENAI_BASE_URL
        api_version: 2024-08-01-preview
        input_cost_per_1m_token: 0.16 # easier to read, less prone to errors
        output_cost_per_1m_token: 0.63 # easier to read, less prone to errors
      model_info:
        version: 2024-07-18
        rate_limit: 2M TPM

Are you a ML Ops Team?

Yes

Twitter / LinkedIn details

https://www.linkedin.com/in/fabian-reinold/

freinold avatar Dec 04 '24 12:12 freinold