gradio [Feature request] Gradio provide probability to restore previous task session (by something like task id), so server and client don't require connection keeping alive

[Feature request] Gradio provide probability to restore previous task session (by something like task id), so server and client don't require connection keeping alive

Open garywill opened this issue 2 years ago • 4 comments

trafficstars

[ ] I have searched to see if a similar issue already exists.

Is your feature request related to a problem? Please describe.
Sometimes we have slow or unsteady Internet. And have tasks that need CPU&GPU long time (10 hr) to be done on huggingface. I've encountered network break many times using python gradio_client. The network break can be caused by ISP or any reason, which gradio can't control. Cause we're running long-time task, our server and client code should take that into account.

After network recovery, I open huggingface space on web. I see from the web log that the task still going on and succuessfully finished. But the gradio client can't download the result just because a 5 minutes network break.

Describe the solution you'd like
Gradio provide probability to restore previous "task session" (by something like "task id" machanism), so client can download result. (I'm not talking about gradio improving code to keep connection alive or prevent network break. Internet unsteady is what we can't control. ISP can cut our connection anytime, if we're trying to run a 10-hour task)

Additional context

Nov 17 '23 15:11 garywill

Hi @garywill can you please provide a repro for us to investigate this issue?

Nov 21 '23 09:11 abidlabs

I think the point is, gradio server should give the client a job id when it receive a request from client. Later client can fetch result using the job id. Currently the client.predict() of python gradio_client requires keepalive connection, which is not necessary.

Currently:

The gradio server doesn't react to the losing of connection with client.
The job continue till job finish, on server.
The client can't fetch the job result due to connection lose, which causes a waste of server CPU.
After network recovery we use the client to submit the request again, then server run the same job again.

Nov 22 '23 02:11 garywill

Hi @garywill sorry for the long delay. I've just been tackling some issues related to the client and came across this. I suspect that this may have been solved when we migrated from websockets to SSE but am not sure. Do you know what versions of Gradio / Gradio Client you were using? Are you still facing this issue?

Mar 15 '24 15:03 abidlabs

Hi @abidlabs , thank you for replying.

Sorry I didn't make my self clear. I was not talking about gradio improving code to keep connection alive or prevent network break. Internet unsteady is what we can't control. ISP can cut our connection anytime, if we're trying to run a 10-hour task.

I was filing a feature request: Gradio provide probability to restore previous "task session" (by something like "task id" machanism), so server and client don't require connection keeping alive.

After seeing your reply I upgrade to latest gradio-4.21.0 and gradio_client-0.12.0 , and then I did a simple test:

Use gradio_client to submit a task to gradio server. Client waiting for result from server. A "task session" was made and being procceed.
Unplug the Internet wire when client waiting for server's result.

Apparently client quickly exited and throw error. And there's no way to restore the broken "task session". Because there's no "task id" machanism currently.

Mar 15 '24 16:03 garywill

gradio gradio copied to clipboard

[Feature request] Gradio provide probability to restore previous task session (by something like task id), so server and client don't require connection keeping alive

gradio
gradio copied to clipboard