serve
serve copied to clipboard
When will a new worker be started by Torchserver ?
[Question]
As here mentioned:
max_worker is the parameter that TorchServe will make no more than this number of workers for the specified model.
Does that mean that TorchServe will automatically start a new worker for the registered model during the inference stage when there still exists enough GPU memory?
Suppose we have configuration like below
models={\
"network": {\
"1.0": {\
"defaultVersion": true,\
"marName": "network.mar",\
"minWorkers": 1,\
"maxWorkers": 4,\
"batchSize": 1,\
"maxBatchDelay": 100,\
"responseTimeout": 120\
}\
}\
}
I monitored my server and noticed that torchserver always started 1 worker for network.mar even if there was enough GPU memory to start 4 workers.
Thanks for your help and explanation in advance
Hi @NormXU just to clarify if you set minWorkers=4
or some number larger than 1 then you get the behavior you expect which is a larger memory allocation on GPU? But when it's equal to 1 you're observing that maxWorkers
has no impact?
@msaroufim Exactly. I expect the the torchserver can automatically start or kill workers according to the left GPU memory and maxWorkers
is the largest number of workers a handler can start. However, my experiments showed that it might not work in this way