metaflow icon indicating copy to clipboard operation
metaflow copied to clipboard

Set timeout in the server

Open ivanaclairineirsan opened this issue 6 months ago • 3 comments

I'd like to suggest adding connection timeout to avoid the server hang when the client is crash mid-session. [metaflow/plugins/env_escape/server.py#L283](https://github.com/Netflix/metaflow/blob/79b7aa8e02a300b2e9c2c6a2c4ee2e2d851dee7b/metaflow/plugins/env_escape/server.py#L283)

If the client hang in the middle of session, the server will wait forever and can't die properly as it wouldn't know if they already lost the client.

ivanaclairineirsan avatar Jun 16 '25 07:06 ivanaclairineirsan

Good point. We can make this more robust. May I ask how you are using the escape hatch. It's a fairly obscure feature (we use it quite a bit but am curious as to other use cases as well).

Feel free to open a PR but I'll also try to take a look at it.

romain-intel avatar Jun 20 '25 07:06 romain-intel

@ivanaclairineirsan @romain-intel, if anyone is not working on this, I can take this one. I believe, the simple fix would be to add a line socket.settimeout(10) before line 283 in metaflow/plugins/env_escape/server.py file.

Not sure if we want to add a test case to test this, as this a standard library function.

patel-lay avatar Jul 28 '25 23:07 patel-lay

I tried to push the changes to open a PR, but I don’t have permission to push to this repository. Can you advise me on how to proceed?

For the use of the escape hatch, I think mainly connected to the conda step in the workflow. the lack of a server timeout sometimes causes workflows to hang for a long time, particularly when the step is defined with missing dependencies or incompatible versions.

On a side note, would it be okay to request a CVE number for this issue? Thanks!

ivanaclairineirsan avatar Sep 11 '25 08:09 ivanaclairineirsan