serve
serve copied to clipboard
[question] How to properly handle client request cancelation during inference?
Hey all,
My model's inference is quite long-running (around 50 seconds per request), so it would be great if closed client connections are handled properly by interrupting the inference that's currently in progress. I'm currently implementing initialize
, preprocess
, inference
and postprocess
methods in my custom handler class. What's the proper place for detecting closed connection, if possible?
Thanks, Miro
@miroslavLalev There are 2 model level configuration to address long-running inference request.
- responseTimeout: this parameter is able to avoid TorchServe frontend disconnect with backend worker (ie. model handler).
- clientTimeoutInMills: TorchServe will either skip processing the request if the request is still pending in frontend queue or stop sending response to client if the response is already received from backend worker when client connection timeout/
Both of the parameters can be set in model-config.yaml.
@miroslavLalev
I tried responseTimeout=5
(my model inference time is 10s
). after calling torchserve inference endpoint I found log like this:
2024-03-13T23:40:02,848 [ERROR] W-9004-bert4rec_240314-083734 org.pytorch.serve.wlm.WorkerThread - Number or consecutive unsuccessful inference 1
2024-03-13T23:40:02,857 [ERROR] W-9004-bert4rec_240314-083734 org.pytorch.serve.wlm.WorkerThread - Backend worker error
org.pytorch.serve.wlm.WorkerInitializationException: Backend worker did not respond in given time
at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:230) [model-server.jar:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) [?:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
at java.lang.Thread.run(Thread.java:833) [?:?]
2024-03-13T23:40:02,966 [INFO ] epollEventLoopGroup-5-2 org.pytorch.serve.wlm.WorkerThread - 9004 Worker disconnected. WORKER_MODEL_LOADED
...
2024-03-13T23:40:02,971 [INFO ] W-9004-bert4rec_240314-083734 org.pytorch.serve.wlm.WorkerThread - Auto recovery start timestamp: 1710373202971
But auto recovery is failed again and again...:
2024-03-13T23:41:05,887 [WARN ] W-9004-bert4rec_240314-083734 org.pytorch.serve.wlm.WorkerThread - Auto recovery failed again
Is it normal situation or bug? do you know how to fix it?
@gukwonku please set your responseTimeout
to be greater than model inference time.
Also, the worker recovery issue has been fixed and is included in the latest release: https://github.com/pytorch/serve/releases/tag/v0.10.0