Limit resource in docker compose and worker in model
📚 The doc issue
I knew that the number of workers for my model should be smaller than the physical cores (in CPU mode). However, when I limit the resources in my Docker container using the following configuration:
deploy:
resources:
limits:
cpus: '1'
memory: 20G
I set the CPU cores limit to 1, but my model is running with workers = 16. How is this possible? When I go into the container and run:
>>> os.cpu_count()
>>> 16
I see that it returns 16. I'm not sure if this is okay for my model, and I'm concerned whether it will have a negative impact on inference when I scale it.
Suggest a potential alternative/fix
No response
Hi @ToanLyHoa , The way to limit workers in TorchServe is using initial_workers=1 or setting minWorkers=1 . If applying kubernetes limits is not having an effect, that would be a good question for kubernetes I guess.
Hi @agunapal, thanks for your answer. I meant that I am applying the following limits in Kubernetes:
resources:
limits:
cpu: '1'
However, my model is running with workers = 16. Will the performance differ between these two scenarios:
torchserve workers=16 with kubernetes cpu=16 torchserve workers=16 with kubernetes cpu=1
I'm concerned about whether setting kubernetes cpu=1 will negatively impact the inference performance compared to having kubernetes cpu=16.