serve Limit resource in docker compose and worker in model

📚 The doc issue

I knew that the number of workers for my model should be smaller than the physical cores (in CPU mode). However, when I limit the resources in my Docker container using the following configuration:

deploy:
  resources:
    limits:
      cpus: '1'
      memory: 20G

I set the CPU cores limit to 1, but my model is running with workers = 16. How is this possible? When I go into the container and run:

>>> os.cpu_count()
>>> 16

I see that it returns 16. I'm not sure if this is okay for my model, and I'm concerned whether it will have a negative impact on inference when I scale it.

Suggest a potential alternative/fix

No response

May 21 '24 07:05 ToanLyHoa

Hi @ToanLyHoa , The way to limit workers in TorchServe is using initial_workers=1 or setting minWorkers=1 . If applying kubernetes limits is not having an effect, that would be a good question for kubernetes I guess.

May 21 '24 16:05 agunapal

Hi @agunapal, thanks for your answer. I meant that I am applying the following limits in Kubernetes:

resources:
  limits:
    cpu: '1'

However, my model is running with workers = 16. Will the performance differ between these two scenarios:

torchserve workers=16 with kubernetes cpu=16 torchserve workers=16 with kubernetes cpu=1

I'm concerned about whether setting kubernetes cpu=1 will negatively impact the inference performance compared to having kubernetes cpu=16.

May 22 '24 02:05 ToanLyHoa