BentoML [minor issue] The BentoML image becomes too heavy after including the dependency on nvidia-ml-py3.

[minor issue] The BentoML image becomes too heavy after including the dependency on nvidia-ml-py3.

Open KimSoungRyoul opened this issue 1 year ago • 2 comments

trafficstars

https://github.com/bentoml/BentoML/blob/cc765bba83501f446297de31fdc819cd7dcc2901/pyproject.toml#L40C23-L40C23

To be precise, build times have increased since the pynvml<12 dependency was added.

bento image size was increased more than 2~4GB (in my case (pytorch cpu))

스크린샷 2023-11-30 오후 5 54 52

given that most model serving is GPU-based anyway, adding a dependency makes sense.

but I think it can be more useful if exclude gpu extra option is added

like a bentoml[cpu-only]

# bentofile.yaml

docker:
   cuda_enable: false 
   # cuda_version: "11.6.2"

Nov 30 '23 09:11 KimSoungRyoul

I don't think this is caused by this dependency specifically. This has two reasons here.

The reason why for setting based cuda image is that even on nodes that have older CUDA version, the container won't be affected by the system cuda. nvidia-container-toolkit should just be able to use the newer cuda version from the container with the existing GPU within the node
So there is a thing with torch, that it will install the nvidia equivalent pypi package of all the cuda and cudnn headers. This means, on the resulted container, there will be two places that contains cuda and cudnn headers.

One solution is to add the PYTHONPATH so that the container recognize the cuda headers file from PYPI, but honestly I think this is a bit too much of a hack.

I think this is a compromise that we will have to take for now.

I don't think we should have a cpu only option. But good discussion regardless.

Nov 30 '23 12:11 aarnphm

Hi @aarnphm , don't you think that a simple reorganization of the dependencies in the pyproject.toml would allow bentoml to propose a CPU only version ? If you are interested I could propose a PR

Removing nvidia drivers saves ~3GB for CPU inference, it's huge

May 15 '24 15:05 michaelromagne

BentoML BentoML copied to clipboard

[minor issue] The BentoML image becomes too heavy after including the dependency on nvidia-ml-py3.

BentoML
BentoML copied to clipboard