Long running request tried automatically?

Open ElSrJuez opened this issue 4 months ago • 1 comments

I am sending a long request (phi 3.5 mini) and after a few minutes, it seemingly failed and was retried automatically. I found no retry logic in my code, I guess it must be foundry itself or the Python OpenAI SDK?

2025-08-19 20:52:40,811 - INFO - HTTP Request: GET http://localhost:5273/foundry/list "HTTP/1.1 200 OK" 2025-08-19 20:52:40,939 - INFO - HTTP Request: GET http://localhost:5273/openai/models "HTTP/1.1 200 OK" 2025-08-19 20:52:44,997 - INFO - HTTP Request: GET http://localhost:5273/openai/load/Phi-3.5-mini-instruct-generic-cpu?ttl=600 "HTTP/1.1 200 OK" 2025-08-19 21:02:45,468 - INFO - Retrying request to /chat/completions in 0.413122 seconds

problem is, that request if retried before it completed will simply continue forever and the end result is a hung app and a hung foundry service.

i had literally to stop the service for the CPU to go back down to normal.

Aug 19 '25 19:08 ElSrJuez

Hi @ElSrJuez, it looks like the retry occurred after 10mins, which is the default TTL send to the model server. The load operation should not take 10 minutes. Is this behavior repeatable?

Sep 27 '25 20:09 natke