[Bug]: The meaning of "max_tokens" reported by /model/info is inconsistent
What happened?
We've noticed that "max_tokens" (from https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json or https://github.com/BerriAI/litellm/blob/main/litellm/model_prices_and_context_window_backup.json) sometimes means max_input_tokens and sometimes max_output_tokens. We were internally relying on it to mean max_input_tokens. We'll switch to explicitely using max_input_tokens instead but it does seem odd that the meaning max_tokens is inconsistent.
Relevant log output
No response
Twitter / LinkedIn details
No response
Continuing discussion from linkedin:
Hey @jeromeroussin would it be simpler if:
- max_tokens = input + output token combined
- max_input_tokens = max tokens you can put in
- max_output_tokens = max tokens you can ask it to generate
in dev code, you'd probably need an if/else check:
- if max_tokens == max_input_tokens: // leave some buffer for output tokens return max_tokens * 0.7
- elif max_tokens == max_input + max_output_tokens: return max_input tokens since the decision re: buffer is probably implementation specific
?
closing as not planned in favor of using max_input_tokens and max_output_tokens
Can revisit this though