ChatGPT-API-server icon indicating copy to clipboard operation
ChatGPT-API-server copied to clipboard

[New Feature] Add default retry to every `/api/ask` endpoint to utilize connection pool.

Open ahmetkca opened this issue 2 years ago • 11 comments

I think we can add default number of retry to each incoming request to /api/ask endpoint. Instead of returning Content-Type: 'application/json' we could return Content-Type: 'text/event-stream'. With this change it might be slightly slower than the original but we would at least try at least 3 times with different agents from connection pool.

For example, proposed change:

/api/ask endpoint content type would be text/event-stream instead of application/json:

curl "http://localhost:8080/api/ask" -X POST --header 'Authorization: <API_KEY>' -d '{"content": "Hello world"}'

default minimum retry # is 3.

In this example /api/ask endpoint failed with first two agent and was successful at the third one.

data: retry #1 failed.

data: retry #2 failed.

data: {"message": { ... }, "conversation_id": " ... "}

data: [DONE]

I believe this would utilize the connection pool even better.

ahmetkca avatar Dec 28 '22 19:12 ahmetkca

The retry functionality has been added to https://github.com/ChatGPT-Hackers/ChatGPT-API-server/tree/dev

acheong08 avatar Dec 29 '22 06:12 acheong08

I'm still testing it out

acheong08 avatar Dec 29 '22 06:12 acheong08

There is a bug I don't know how to fix

acheong08 avatar Dec 29 '22 08:12 acheong08

IMO, you need to use some sort of blocking mechanism, so you can retrieve quickly websockets, which are available at the moment, or just passively wait for any new one to be available.

One way to do that is to use a sort of priority queue, where you will be grading just based on this parameter(available/not_available). And make sure, that you can write to it in a multi-threaded fashion updates on whether websocket is used or not.

Also, a good feature to have would be to add a number of request, that was already made to a particular connection. I am not aware of exact number of request per hour, that are permitted, but in my testing I have hit some limits within an hour. In this case we don't really want to navigate our requests to this websocket.

0xRaduan avatar Dec 29 '22 09:12 0xRaduan

I was gonna write a queue system but I'm not quite sure how to implement it correctly. Gin is inherently multi-threaded and there is already a blocking mechanism in place for the connection pool though.

acheong08 avatar Dec 29 '22 10:12 acheong08

Also, a good feature to have would be to add a number of request, that was already made to a particular connection. I am not aware of exact number of request per hour, that are permitted, but in my testing I have hit some limits within an hour. In this case we don't really want to navigate our requests to this websocket.

Since it is cycling by oldest connection first, each connection should have a similar number of requests. If limits are hit, all existing connections would also be rate limited in the subsequent request.

acheong08 avatar Dec 29 '22 10:12 acheong08

In my experience, one conversation_id is bind to one OpenAI account, one conversation_id can be used multi times, representing a long multi round conversation. So different accounts may be used at different frequencies.

Another experience is that when the api server has not been connected for a period of time, the first request will returns {"id": "65f76efa-e0cb-47c1-a054-6f6b5fd5888d", "message": "error", "data": "Wrong response code"}} . But the immediately next request will return normally. Guess that the connection was becoming invalid because there was no connection for long time? (the firefox tab needs refresh?)

If we maintain rate of each account, then

  • if one account was rate limited, we can redirect request to other accounts
  • if one account has no connection for a period of time (such as 10 minutes), we can send a fake request to keep it alive

icycandy avatar Jan 04 '23 07:01 icycandy

The error handling can be done on the client side. If you get a message of "error", let it sleep a second or two and then try again. Doing this from the server could clog up the connection and compete with actual requests, introducing additional downtime and errors

acheong08 avatar Jan 04 '23 08:01 acheong08

if one account was rate limited, we can redirect request to other accounts

Possible. Will consider it.

acheong08 avatar Jan 04 '23 08:01 acheong08

How about the other one: regularly send fake request to idle connection?

icycandy avatar Jan 04 '23 11:01 icycandy

How about the other one: regularly send fake request to idle connection?

I'm not sure whether regulary send fake request will help keeping the connection alive. Ignore me.

icycandy avatar Jan 04 '23 13:01 icycandy