BentoML icon indicating copy to clipboard operation
BentoML copied to clipboard

bug: fail to build bento on none-gpu env. because NvidiaGpuResource.from_system throws

Open Epsilon314 opened this issue 6 months ago • 1 comments

Describe the bug

Try to build a bento on a machine without Nvidia GPU and has cuda toolkit installed. It failed because method NvidiaGpuResource.from_system throws at

try:
            pynvml.nvmlInit()
            device_count = pynvml.nvmlDeviceGetCount()
            return list(range(device_count))
        except (pynvml.NVMLError_LibraryNotFound ,OSError):
            logger.debug("GPU not detected. Unable to initialize pynvml lib.")
            return []

The exception pynvml.NVMLError_DriverNotLoaded may also need to be catched, in case nvml presents but gpu not

To reproduce

No response

Expected behavior

No response

Environment

Environment variable

BENTOML_DEBUG=''
BENTOML_QUIET=''
BENTOML_BUNDLE_LOCAL_BUILD=''
BENTOML_DO_NOT_TRACK=''
BENTOML_CONFIG=''
BENTOML_CONFIG_OPTIONS=''
BENTOML_PORT=''
BENTOML_HOST=''
BENTOML_API_WORKERS=''

System information

bentoml: 1.1.10 python: 3.8.18 platform: Linux-5.4.143.bsk.8-amd64-x86_64-with-glibc2.28 uid_gid: 1001:1001

pip_packages
accelerate==0.25.0
aiohttp==3.9.1
aiosignal==1.3.1
anyio==4.1.0
appdirs==1.4.4
asgiref==3.7.2
async-timeout==4.0.3
attrs==23.1.0
bentoml==1.1.10
bitsandbytes==0.41.3.post2
build==0.10.0
cattrs==23.1.2
certifi==2023.11.17
charset-normalizer==3.3.2
circus==0.18.0
click==8.1.7
click-option-group==0.5.6
cloudpickle==3.0.0
coloredlogs==15.0.1
contextlib2==21.6.0
cuda-python==12.3.0
datasets==2.15.0
deepmerge==1.1.0
Deprecated==1.2.14
diffusers==0.24.0
dill==0.3.7
distlib==0.3.8
distro==1.8.0
einops==0.7.0
exceptiongroup==1.2.0
fastcore==1.5.29
filelock==3.13.1
filetype==1.2.0
frozenlist==1.4.0
fs==2.4.16
fsspec==2023.10.0
ghapi==1.0.4
h11==0.14.0
httpcore==1.0.2
httpx==0.25.2
huggingface-hub==0.19.4
humanfriendly==10.0
idna==3.6
importlib-metadata==6.11.0
inflection==0.5.1
Jinja2==3.1.2
markdown-it-py==3.0.0
MarkupSafe==2.1.3
mdurl==0.1.2
mpmath==1.3.0
multidict==6.0.4
multiprocess==0.70.15
mypy-extensions==1.0.0
networkx==3.1
numpy==1.24.4
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-ml-py==11.525.150
nvidia-nccl-cu12==2.18.1
nvidia-nvjitlink-cu12==12.3.101
nvidia-nvtx-cu12==12.1.105
onediffusion==0.0.3
openllm==0.4.36
openllm-client==0.4.36
openllm-core==0.4.36
opentelemetry-api==1.20.0
opentelemetry-instrumentation==0.41b0
opentelemetry-instrumentation-aiohttp-client==0.41b0
opentelemetry-instrumentation-asgi==0.41b0
opentelemetry-sdk==1.20.0
opentelemetry-semantic-conventions==0.41b0
opentelemetry-util-http==0.41b0
optimum==1.15.0
orjson==3.9.10
packaging==23.2
pandas==2.0.3
pathspec==0.12.1
Pillow==10.1.0
pip-requirements-parser==32.0.1
pip-tools==7.3.0
platformdirs==4.1.0
prometheus-client==0.19.0
protobuf==4.25.1
psutil==5.9.6
pyarrow==14.0.1
pyarrow-hotfix==0.6
pydantic==1.10.13
Pygments==2.17.2
pyparsing==3.1.1
pyproject_hooks==1.0.0
python-dateutil==2.8.2
python-json-logger==2.0.7
python-multipart==0.0.6
pytz==2023.3.post1
PyYAML==6.0.1
pyzmq==25.1.2
regex==2023.10.3
requests==2.31.0
rich==13.7.0
safetensors==0.4.1
schema==0.7.5
scipy==1.10.1
sentencepiece==0.1.99
simple-di==0.1.5
six==1.16.0
sniffio==1.3.0
starlette==0.33.0
sympy==1.12
tabulate==0.9.0
tokenizers==0.15.0
tomli==2.0.1
torch==2.1.1
tornado==6.4
tqdm==4.66.1
transformers==4.36.0
triton==2.1.0
typing_extensions==4.9.0
tzdata==2023.3
urllib3==2.1.0
uvicorn==0.24.0.post1
virtualenv==20.25.0
watchfiles==0.21.0
wcwidth==0.2.12
wrapt==1.16.0
xxhash==3.4.1
yarl==1.9.4
zipp==3.17.0

Epsilon314 avatar Dec 22 '23 11:12 Epsilon314