[Feature]: support service_tier for o3 and o4-mini
The Feature
Hi,
Supporting the service_tier parameter ("flex") can help cut down costs by half using o3 or o4-mini.
See: https://platform.openai.com/docs/guides/flex-processing
Motivation, pitch
It helps to cut down costs by half, which is always nice :)
Are you a ML Ops Team?
No
Twitter / LinkedIn details
No response
According to the document, any parameters not on this list are considered provider specific and will be passed directly to the LLM API.
So you can use this parameter directly. I have tried it, and it works fine.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
Is there a way to add this parameter to the model endpoint stored in the litellm gateway through the UI? Meaning all calls would be sent with the service_tier parameter set?