ChatGPT-API-server
ChatGPT-API-server copied to clipboard
[New Feature] Add default retry to every `/api/ask` endpoint to utilize connection pool.
I think we can add default number of retry to each incoming request to /api/ask
endpoint. Instead of returning Content-Type: 'application/json'
we could return Content-Type: 'text/event-stream'
. With this change it might be slightly slower than the original but we would at least try at least 3 times with different agents from connection pool.
For example, proposed change:
/api/ask
endpoint content type would be text/event-stream
instead of application/json
:
curl "http://localhost:8080/api/ask" -X POST --header 'Authorization: <API_KEY>' -d '{"content": "Hello world"}'
default minimum retry # is 3.
In this example /api/ask
endpoint failed with first two agent and was successful at the third one.
data: retry #1 failed.
data: retry #2 failed.
data: {"message": { ... }, "conversation_id": " ... "}
data: [DONE]
I believe this would utilize the connection pool even better.
The retry functionality has been added to https://github.com/ChatGPT-Hackers/ChatGPT-API-server/tree/dev
I'm still testing it out
There is a bug I don't know how to fix
IMO, you need to use some sort of blocking mechanism, so you can retrieve quickly websockets, which are available at the moment, or just passively wait for any new one to be available.
One way to do that is to use a sort of priority queue, where you will be grading just based on this parameter(available/not_available). And make sure, that you can write to it in a multi-threaded fashion updates on whether websocket is used or not.
Also, a good feature to have would be to add a number of request, that was already made to a particular connection. I am not aware of exact number of request per hour, that are permitted, but in my testing I have hit some limits within an hour. In this case we don't really want to navigate our requests to this websocket.
I was gonna write a queue system but I'm not quite sure how to implement it correctly. Gin is inherently multi-threaded and there is already a blocking mechanism in place for the connection pool though.
Also, a good feature to have would be to add a number of request, that was already made to a particular connection. I am not aware of exact number of request per hour, that are permitted, but in my testing I have hit some limits within an hour. In this case we don't really want to navigate our requests to this websocket.
Since it is cycling by oldest connection first, each connection should have a similar number of requests. If limits are hit, all existing connections would also be rate limited in the subsequent request.
In my experience, one conversation_id
is bind to one OpenAI account, one conversation_id
can be used multi times, representing a long multi round conversation. So different accounts may be used at different frequencies.
Another experience is that when the api server has not been connected for a period of time, the first request will returns {"id": "65f76efa-e0cb-47c1-a054-6f6b5fd5888d", "message": "error", "data": "Wrong response code"}}
. But the immediately next request will return normally. Guess that the connection was becoming invalid because there was no connection for long time? (the firefox tab needs refresh?)
If we maintain rate of each account, then
- if one account was rate limited, we can redirect request to other accounts
- if one account has no connection for a period of time (such as 10 minutes), we can send a fake request to keep it alive
The error handling can be done on the client side. If you get a message of "error", let it sleep a second or two and then try again. Doing this from the server could clog up the connection and compete with actual requests, introducing additional downtime and errors
if one account was rate limited, we can redirect request to other accounts
Possible. Will consider it.
How about the other one: regularly send fake request to idle connection?
How about the other one: regularly send fake request to idle connection?
I'm not sure whether regulary send fake request will help keeping the connection alive. Ignore me.