add max tokens to health check
Title
health check should only need a minimal token completion call
Relevant issues
Type
🐛 Bug Fix
Changes
With the health check endpoint we really only care if we get a 200 status back not the output, so its best to send max_tokens, otherwise as we have seen, sending a health check every 300 seconds can get very expensive for output tokens.
The latest updates on your projects. Learn more about Vercel for Git ↗︎
| Name | Status | Preview | Comments | Updated (UTC) |
|---|---|---|---|---|
| litellm | ✅ Ready (Inspect) | Visit Preview | 💬 Add feedback | Dec 3, 2024 0:32am |
@krrishdholakia can you take a look at this PR please? The health check ended up costing us a lot of money when its just looking at the return not content.
Makes sense. And sorry for that. Yes I'll merge this in today