BentoML
BentoML copied to clipboard
[minor issue] The BentoML image becomes too heavy after including the dependency on nvidia-ml-py3.
https://github.com/bentoml/BentoML/blob/cc765bba83501f446297de31fdc819cd7dcc2901/pyproject.toml#L40C23-L40C23
To be precise, build times have increased since the pynvml<12 dependency was added.
bento image size was increased more than 2~4GB (in my case (pytorch cpu))
given that most model serving is GPU-based anyway, adding a dependency makes sense.
but I think it can be more useful if exclude gpu extra option is added
like a bentoml[cpu-only]
or
# bentofile.yaml
docker:
cuda_enable: false
# cuda_version: "11.6.2"
I don't think this is caused by this dependency specifically. This has two reasons here.
- The reason why for setting based cuda image is that even on nodes that have older CUDA version, the container won't be affected by the system cuda. nvidia-container-toolkit should just be able to use the newer cuda version from the container with the existing GPU within the node
- So there is a thing with torch, that it will install the nvidia equivalent pypi package of all the cuda and cudnn headers. This means, on the resulted container, there will be two places that contains cuda and cudnn headers.
One solution is to add the PYTHONPATH so that the container recognize the cuda headers file from PYPI, but honestly I think this is a bit too much of a hack.
I think this is a compromise that we will have to take for now.
I don't think we should have a cpu only option. But good discussion regardless.
Hi @aarnphm , don't you think that a simple reorganization of the dependencies in the pyproject.toml would allow bentoml to propose a CPU only version ? If you are interested I could propose a PR
Removing nvidia drivers saves ~3GB for CPU inference, it's huge