DALI
DALI copied to clipboard
Add error message when GPU is not available
Category:
Other (e.g. Documentation, Tests, Configuration)
Description:
Currently, when DALI pipeline is created in Triton, but user forgets to pass --gpus
flag to the run command, he gets an obscure error message:
dlopen libcuda.so failed!. Please install GPU dirverTraceback (most recent call last):
File "<string>", line 8, in <module>
File "/opt/tritonserver/backends/dali/conda/envs/dalienv/lib/python3.10/site-packages/nvidia/dali/_utils/autoserialize.py", line 77, in invoke_autoserialize
dali_pipeline().serialize(filename=filename)
File "/opt/tritonserver/backends/dali/conda/envs/dalienv/lib/python3.10/site-packages/nvidia/dali/pipeline.py", line 1261, in serialize
self._init_pipeline_backend()
File "/opt/tritonserver/backends/dali/conda/envs/dalienv/lib/python3.10/site-packages/nvidia/dali/pipeline.py", line 725, in _init_pipeline_backend
self._pipe = b.Pipeline(self._max_batch_size,
RuntimeError: [/opt/dali/dali/core/device_guard.cc:31] Assert on "cuInitChecked()" failed: Failed to load libcuda.so. Check your library paths and if the driver is installed correctly.
Stacktrace (31 entries):
[frame 0]: /opt/tritonserver/backends/dali/conda/envs/dalienv/lib/python3.10/site-packages/nvidia/dali/libdali_core.so(+0x233fb) [0x7fed69bb13fb]
[frame 1]: /opt/tritonserver/backends/dali/conda/envs/dalienv/lib/python3.10/site-packages/nvidia/dali/libdali_core.so(dali::DeviceGuard::DeviceGuard(int)+0x1a8) [0x7fed69bd4548]
[frame 2]: /opt/tritonserver/backends/dali/conda/envs/dalienv/lib/python3.10/site-packages/nvidia/dali/libdali.so(dali::Pipeline::Init(int, int, int, long, bool, bool, bool, unsigned long, bool, int, int, dali::QueueSizes)+0x50) [0x7fed6f81a620]
[frame 3]: /opt/tritonserver/backends/dali/conda/envs/dalienv/lib/python3.10/site-packages/nvidia/dali/backend_impl.cpython-310-x86_64-linux-gnu.so(dali::Pipeline::Pipeline(int, int, int, long, bool, int, bool, unsigned long, bool, int, int)+0x363) [0x7fed64c05a53]
This PR introduces more descriptive error message.
Additional information:
Affected modules and functionalities:
Key points relevant for the review:
Tests:
- [ ] Existing tests apply
- [ ] New tests added
- [ ] Python tests
- [ ] GTests
- [ ] Benchmark
- [ ] Other
- [ ] N/A
Checklist
Documentation
- [ ] Existing documentation applies
- [ ] Documentation updated
- [ ] Docstring
- [ ] Doxygen
- [ ] RST
- [ ] Jupyter
- [ ] Other
- [ ] N/A
DALI team only
Requirements
- [ ] Implements new requirements
- [ ] Affects existing requirements
- [ ] N/A
REQ IDs: N/A
JIRA TASK: N/A
!build
CI MESSAGE: [12924987]: BUILD STARTED
CI MESSAGE: [12924987]: BUILD FAILED
!build
!build
CI MESSAGE: [13002156]: BUILD STARTED
CI MESSAGE: [13002210]: BUILD STARTED
CI MESSAGE: [13002156]: BUILD PASSED
CI MESSAGE: [13002210]: BUILD FAILED
CI MESSAGE: [13002210]: BUILD PASSED