anything-llm
anything-llm copied to clipboard
[FEAT]: Add Support to Anthropic & OpenAI Batch APIs
What would you like to see?
Hey there! First off, thank you for working on this great project :)
Is it possible to add support for Batch APIs, provided by Anthropic and OpenAI? This feature for their APIs basically allows a "50% discount" for the API calls, in exchange for allowing the responses to take up to 24 hours (so that they can run them when the servers aren't overloaded).
This is useful for saving money on calls that aren't necessarily needed immediately (for example, a request to summarize a book, suggest a design for a software, etc.). Especially when using the more expensive models, like Claude Opus and OpenAI o1, with big context size.
The way it works seems to be that once the request is sent, you can query it to check whether the request is finished or not (so you'll need to query it in interval. For example, every minute. Maybe make it configurable in the settings), and then once the response says it's ready, you can query for the output.
This probably won't be simple, as it requires implementing a new mechanism of waiting for a response (polling) and adding a way to communicate that in the UI (maybe a spinning wheel showing the response hasn't been generated yet), but I do think it will be a great addition that will be very useful. Plus, since OpenAI introduced it, and now Anthropic followed, we might see more of these available for other APIs (this also means that if implemented, having generic code that will support other similar APIs in the future might be a good idea).