Long running request tried automatically?
I am sending a long request (phi 3.5 mini) and after a few minutes, it seemingly failed and was retried automatically. I found no retry logic in my code, I guess it must be foundry itself or the Python OpenAI SDK?
2025-08-19 20:52:40,811 - INFO - HTTP Request: GET http://localhost:5273/foundry/list "HTTP/1.1 200 OK" 2025-08-19 20:52:40,939 - INFO - HTTP Request: GET http://localhost:5273/openai/models "HTTP/1.1 200 OK" 2025-08-19 20:52:44,997 - INFO - HTTP Request: GET http://localhost:5273/openai/load/Phi-3.5-mini-instruct-generic-cpu?ttl=600 "HTTP/1.1 200 OK" 2025-08-19 21:02:45,468 - INFO - Retrying request to /chat/completions in 0.413122 seconds
problem is, that request if retried before it completed will simply continue forever and the end result is a hung app and a hung foundry service.
i had literally to stop the service for the CPU to go back down to normal.
Hi @ElSrJuez, it looks like the retry occurred after 10mins, which is the default TTL send to the model server. The load operation should not take 10 minutes. Is this behavior repeatable?