litellm
litellm copied to clipboard
[Feature]: Proxy Config: Cost specification per 1M tokens
The Feature
The proxy configuration provides additional keys for each model:
-
input_cost_per_1m_token
-
output_cost_per_1m_token
These are then divided by 1e6
internally to compute the values of the existing variables input_cost_per_token
/ output_cost_per_token
Motivation, pitch
Sitation
As a proxy admin maintaining many models from Azure OpenAI with custom billing, we need to specify the input and output cost for each model. The Azure Cost calculator provides these per 1K / 1M Tokens, which are generally a way better scale to think about when running many LLM based workloads where each uses a huge number of tokens.
Problem
When specifying the cost per Token in LiteLLM, i currently need to double check each value because they are so small (and my YAML LSP automatically sets them to value like 1.2e-7 or 2.62e-6).
In this format, it is pretty hard to spot errors in the cost config, which would be way easier in a better format like input_cost_per_1m_token: 0.12
or output_cost_per_1m_token: 26.2
Comparison
Current config
proxy_config:
model_list:
- model_name: gpt-4o-mini
litellm_params:
model: azure/gpt-4o-mini
api_key: os.environ/AZURE_API_KEY
api_base: os.environ/AZURE_OPENAI_BASE_URL
api_version: 2024-08-01-preview
input_cost_per_token: 1.6e-7 # alternatively 0.00000016
output_cost_per_token: 6.3e-7 # alternatively 0.00000063
model_info:
version: 2024-07-18
rate_limit: 2M TPM
Proposed config
proxy_config:
model_list:
- model_name: gpt-4o-mini
litellm_params:
model: azure/gpt-4o-mini
api_key: os.environ/AZURE_API_KEY
api_base: os.environ/AZURE_OPENAI_BASE_URL
api_version: 2024-08-01-preview
input_cost_per_1m_token: 0.16 # easier to read, less prone to errors
output_cost_per_1m_token: 0.63 # easier to read, less prone to errors
model_info:
version: 2024-07-18
rate_limit: 2M TPM
Are you a ML Ops Team?
Yes
Twitter / LinkedIn details
https://www.linkedin.com/in/fabian-reinold/