gradio
gradio copied to clipboard
Enable a maximum length for the queue
Is your feature request related to a problem? Please describe.
If a hosted Gradio demo or a Spaces is too popular, the queue can get out of hand. I understand the queue refactor backend (https://github.com/gradio-app/gradio/pull/1489) will make that way better - but even with better handling and de-registering quitters, too popular Spaces may still have way too large queues that make the application unusable - however a few applications dealing with high traffic rather than queuing everybody, reward the most resilient users that stay on the page clicking "run" until they grab a spot - as a 2h queue that would be 'fair' to all users would probably not work anyway.
Describe the solution you'd like
Create a setting that enable a maximum length for the queue. After that maximum length, users that try to run the Space get a "Space too busy, the queue is full, try again" message instead of being registered to the queue. This rewards 'resilient' users and forces the queue to be shorter.
This maximum length could be either time based (force the queue to always take x minutes or less based on the estimation of a task duration) - or # of requests based (the queue can only have y requests maximum)
There could be a value that is reasonable (say, 15 minutes) active by default.
Additional context
Internal discussion: https://huggingface.slack.com/archives/C02QZLG8GMN/p1658512308000649
I like this idea quite a bit. We could even have a max-length setup by default as I don't think when a queue has 50 requests it useful anymore.
We don't probably add a default for this, but will let people configure a queue-eta limit on a space and frontend will warn users accordingly when queue_eta is bigger than the threshold.
queue-eta is meaningful for putting a limit on user wait time queue-size is meaningful for putting a limit on communication overhead(but not necessary, because it could be handled with configuring queue-info-update-intervals)
converted the issue into max-eta, which will be probably needed to balance autoscaling.
It looks like we have already implemented this, although with a max queue size instead of a max eta. I haven't seen any requests for an eta specifically so I think we can close thise (anyways, eta calculations can be little tricky when you have multiple functions in one Interface, so best to stick with queue size imo)