mxnet
mxnet copied to clipboard
Restoring TVMOp tests
Description
(Brief description on what this PR is about) Restoring TVMOp tests. #18204 #18526 #17840
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
- [ ] The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
- [ ] Changes are complete (i.e. I finished coding on this PR)
- [ ] All changes have test coverage:
- Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
- Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
- Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
- [ ] Code is well-documented:
- For user-facing API changes, API doc string has been updated.
- For new C++ functions in header files, their functionalities and arguments are documented.
- For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
- Check the API doc at https://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
- [ ] To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change
Changes
- [ ] Feature1, tests, (and when applicable, API doc)
- [ ] Feature2, tests, (and when applicable, API doc)
Comments
- If this change is a backward incompatible change, why must this change be made.
- Interesting edge cases to note here
Hey @jinboci , Thanks for submitting the PR All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:
- To trigger all jobs: @mxnet-bot run ci [all]
- To trigger specific jobs: @mxnet-bot run ci [job1, job2]
CI supported jobs: [clang, centos-cpu, miscellaneous, sanity, windows-gpu, windows-cpu, unix-gpu, centos-gpu, unix-cpu, website, edge]
Note: Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin. All CI tests must pass before the PR can be merged.
@mxnet-bot run ci [unix-cpu, unix-gpu]
Jenkins CI successfully triggered : [unix-gpu, unix-cpu]
@mxnet-bot run ci [centos-cpu]
Jenkins CI successfully triggered : [centos-cpu]
You need to investigate why libcuda is not found in the container. Previously there was a hack of putting /usr/local/cuda/compat on the path, but that may not be the correct solution. AFAIK libcuda will be provided by https://github.com/NVIDIA/nvidia-docker/ inside the container based on the host system libcuda, typically only on a host system with gpus.
@leezu Just check whether my understanding is correct. libcuda.so exists on the hosts which build mxnet, while it does not exist on hosts which run the tests. libcudart.so exist on both hosts, is it correct?
@yzhliu It should be the other way round. Let's open the CI Docker container: docker run -it mxnetci/build.ubuntu_gpu_cu102 /bin/bash
and look at the shared libraries in /usr/local/cuda
:
root@de49f0e1966c:/work/mxnet# find /usr/local/cuda-10.2 -name "*.so*"
/usr/local/cuda-10.2/compat/libnvidia-ptxjitcompiler.so.440.33.01
/usr/local/cuda-10.2/compat/libcuda.so
/usr/local/cuda-10.2/compat/libcuda.so.1
/usr/local/cuda-10.2/compat/libcuda.so.440.33.01
/usr/local/cuda-10.2/compat/libnvidia-fatbinaryloader.so.440.33.01
/usr/local/cuda-10.2/compat/libnvidia-ptxjitcompiler.so
/usr/local/cuda-10.2/compat/libnvidia-ptxjitcompiler.so.1
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudart.so.10.2
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudart.so.10.2.89
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcupti.so.10.2
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppim.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppc.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppicc.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcurand.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnpps.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppial.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libOpenCL.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvrtc.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppist.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcuinj64.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libOpenCL.so.1.1
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppig.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppidei.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcusolver.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libaccinj64.so.10.2.89
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppicom.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libaccinj64.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libOpenCL.so.1
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppif.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcufftw.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libaccinj64.so.10.2
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcusolverMg.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcuinj64.so.10.2
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcupti.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcusparse.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvgraph.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnppim.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnppc.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnppicc.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libcurand.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnpps.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnppial.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnvrtc.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnppist.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libcuda.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnvidia-ml.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnppig.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnppidei.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libcusolver.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnppicom.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnppif.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libcufftw.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libcusolverMg.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libcusparse.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnvgraph.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnvjpeg.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnppisu.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnppitc.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libcufft.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvjpeg.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcuinj64.so.10.2.89
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudart.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvperf_target.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppisu.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppitc.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcupti.so.10.2.75
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvperf_host.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcufft.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvToolsExt.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppisu.so.10
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppist.so.10.2.1.89
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvjpeg.so.10.3.1.89
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppitc.so.10.2.1.89
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcusparse.so.10
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcurand.so.10.1.2.89
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvrtc.so.10.2.89
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppif.so.10.2.1.89
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnpps.so.10
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppc.so.10
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppial.so.10
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnpps.so.10.2.1.89
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppidei.so.10.2.1.89
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppc.so.10.2.1.89
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppicom.so.10
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvrtc-builtins.so.10.2
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvToolsExt.so.1
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcufftw.so.10
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcusolverMg.so.10.3.0.89
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcufftw.so.10.1.2.89
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppicc.so.10
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppicc.so.10.2.1.89
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcurand.so.10
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppicom.so.10.2.1.89
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcusolverMg.so.10
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppial.so.10.2.1.89
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppist.so.10
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcusparse.so.10.3.1.89
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvrtc-builtins.so.10.2.89
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcusolver.so.10
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvgraph.so.10
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppim.so.10
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvToolsExt.so.1.0.0
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvrtc-builtins.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcufft.so.10.1.2.89
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppidei.so.10
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcufft.so.10
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppig.so.10.2.1.89
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvjpeg.so.10
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvrtc.so.10.2
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppig.so.10
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcusolver.so.10.3.0.89
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppif.so.10
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppisu.so.10.2.1.89
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppitc.so.10
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppim.so.10.2.1.89
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvgraph.so.10.2.89
/usr/local/cuda-10.2/nvvm/lib64/libnvvm.so.3.3.0
/usr/local/cuda-10.2/nvvm/lib64/libnvvm.so
/usr/local/cuda-10.2/nvvm/lib64/libnvvm.so.3
/usr/local/cuda-10.2/nvvmx/lib64/libnvvm.so.3.3.0
/usr/local/cuda-10.2/nvvmx/lib64/libnvvm.so
/usr/local/cuda-10.2/nvvmx/lib64/libnvvm.so.3
/usr/local/cuda-10.2/extras/Sanitizer/libsanitizer-public.so
Because we don't use the nvidia docker command to run the container, only stubs/libcuda.so
is available. If we're on a host with GPUs, we can use docker run --gpus all -it mxnetci/build.ubuntu_gpu_cu102 /bin/bash
and the libcuda.so
from the host as well as the host GPUs will be available inside the container. But on a CPU host this just leads to
docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: nvml error: driver not loaded\\\\n\\\"\"": unknown.
ERRO[0000] error waiting for container: context canceled
The problem is that some part of the tvmop setup currenly requires libcuda.so
to be available (it's listed as shared library dependency of some shared library that is opened). We need to check which library is introducing the dependency and consider how to fix it. Ideally there shouldn't be a dependency on libcuda.so
as it's only available on GPU hosts.
You can also refer to https://github.com/NVIDIA/nvidia-container-toolkit/issues/185 for a little background. The problem with the compat/libcuda.so
AFAIK is that it does not necessarily fit the driver version of the host system.
@yzhliu @leezu Thank you for your suggestions. I tried to directly disable the linkage of libcuda.so
with
diff --git a/cmake/modules/CUDA.cmake b/cmake/modules/CUDA.cmake
index 936bb681b..32d13de38 100644
--- a/cmake/modules/CUDA.cmake
+++ b/cmake/modules/CUDA.cmake
@@ -35,7 +35,7 @@ if(USE_CUDA)
list(APPEND TVM_LINKER_LIBS ${CUDA_NVRTC_LIBRARY})
list(APPEND TVM_RUNTIME_LINKER_LIBS ${CUDA_CUDART_LIBRARY})
- list(APPEND TVM_RUNTIME_LINKER_LIBS ${CUDA_CUDA_LIBRARY})
+ #list(APPEND TVM_RUNTIME_LINKER_LIBS ${CUDA_CUDA_LIBRARY})
list(APPEND TVM_RUNTIME_LINKER_LIBS ${CUDA_NVRTC_LIBRARY})
if(USE_CUDNN)
diff --git a/cmake/util/FindCUDA.cmake b/cmake/util/FindCUDA.cmake
index f971c87f2..5e2118148 100644
--- a/cmake/util/FindCUDA.cmake
+++ b/cmake/util/FindCUDA.cmake
@@ -58,9 +58,9 @@ macro(find_cuda use_cuda)
# additional libraries
if(CUDA_FOUND)
if(MSVC)
- find_library(CUDA_CUDA_LIBRARY cuda
- ${CUDA_TOOLKIT_ROOT_DIR}/lib/x64
- ${CUDA_TOOLKIT_ROOT_DIR}/lib/Win32)
+ #find_library(CUDA_CUDA_LIBRARY cudart
+ #${CUDA_TOOLKIT_ROOT_DIR}/lib/x64
+ #${CUDA_TOOLKIT_ROOT_DIR}/lib/Win32)
find_library(CUDA_NVRTC_LIBRARY nvrtc
${CUDA_TOOLKIT_ROOT_DIR}/lib/x64
${CUDA_TOOLKIT_ROOT_DIR}/lib/Win32)
@@ -74,13 +74,13 @@ macro(find_cuda use_cuda)
${CUDA_TOOLKIT_ROOT_DIR}/lib/x64
${CUDA_TOOLKIT_ROOT_DIR}/lib/Win32)
else(MSVC)
- find_library(_CUDA_CUDA_LIBRARY cuda
- PATHS ${CUDA_TOOLKIT_ROOT_DIR}
- PATH_SUFFIXES lib lib64 targets/x86_64-linux/lib targets/x86_64-linux/lib/stubs lib64/stubs
- NO_DEFAULT_PATH)
- if(_CUDA_CUDA_LIBRARY)
- set(CUDA_CUDA_LIBRARY ${_CUDA_CUDA_LIBRARY})
- endif()
+ #find_library(_CUDA_CUDA_LIBRARY cudart
+ #PATHS ${CUDA_TOOLKIT_ROOT_DIR}
+ #PATH_SUFFIXES lib lib64 targets/x86_64-linux/lib targets/x86_64-linux/lib/stubs lib64/stubs
+ #NO_DEFAULT_PATH)
+ #if(_CUDA_CUDA_LIBRARY)
+ #set(CUDA_CUDA_LIBRARY ${_CUDA_CUDA_LIBRARY})
+ #endif()
find_library(CUDA_NVRTC_LIBRARY nvrtc
PATHS ${CUDA_TOOLKIT_ROOT_DIR}
PATH_SUFFIXES lib lib64 targets/x86_64-linux/lib targets/x86_64-linux/lib/stubs lib64/stubs lib/x86_64-linux-gnu
However, getting errors while building tvm:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/ubuntu/Documents/tvm/python/tvm/__init__.py", line 25, in <module>
from ._ffi.base import TVMError, __version__
File "/home/ubuntu/Documents/tvm/python/tvm/_ffi/__init__.py", line 28, in <module>
from .base import register_error
File "/home/ubuntu/Documents/tvm/python/tvm/_ffi/base.py", line 62, in <module>
_LIB, _LIB_NAME = _load_lib()
File "/home/ubuntu/Documents/tvm/python/tvm/_ffi/base.py", line 50, in _load_lib
lib = ctypes.CDLL(lib_path[0], ctypes.RTLD_GLOBAL)
File "/home/ubuntu/anaconda3/lib/python3.7/ctypes/__init__.py", line 364, in __init__
self._handle = _dlopen(self._name, mode)
OSError: /home/ubuntu/Documents/tvm/build/libtvm.so: undefined symbol: cuLaunchKernel
It seems that cuLaunchKernel
is one function needed from libcuda.so
(I am not sure if it is). How could we call this function without linking libcuda.so
?
@jinboci would it be possible to dlopen
libcuda
at runtime?
@leezu @yzhliu Hi, I am still unclear about:
- Does the machine in CI that builds mxnet provide
libcuda.so
? - When USE_TVM_OP is OFF, does building mxnet require the dependencies on
libcuda.so
?
I compiled mxnet with USE_TVM_OP OFF and USE_CUDA USE_CUDNN ON, and got:
(base) ubuntu@ip-172-31-37-194:~/Documents/mxnet/build$ ldd libmxnet.so
linux-vdso.so.1 (0x00007ffda2ae3000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f68de615000)
libopenblas.so.0 => /usr/local/lib/libopenblas.so.0 (0x00007f68dd688000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f68dd480000)
libomp.so => /home/ubuntu/Documents/mxnet/build/3rdparty/openmp/runtime/src/libomp.so (0x00007f68dd19a000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f68dcf7b000)
libcudnn.so.7 => /usr/local/cuda/lib64/libcudnn.so.7 (0x00007f68c795c000)
libcuda.so.1 => /usr/lib/x86_64-linux-gnu/libcuda.so.1 (0x00007f68c6774000)
libnvidia-ml.so.1 => /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1 (0x00007f68c614e000)
libnccl.so.2 => /usr/local/cuda/lib/libnccl.so.2 (0x00007f68bf6fa000)
libopencv_imgcodecs.so.4.2 => /usr/local/lib/libopencv_imgcodecs.so.4.2 (0x00007f68bed0d000)
libopencv_imgproc.so.4.2 => /usr/local/lib/libopencv_imgproc.so.4.2 (0x00007f68bd409000)
libopencv_core.so.4.2 => /usr/local/lib/libopencv_core.so.4.2 (0x00007f68bc124000)
libcudart.so.10.0 => /usr/local/cuda/lib64/libcudart.so.10.0 (0x00007f68bbeaa000)
libcufft.so.10.0 => /usr/local/cuda/lib64/libcufft.so.10.0 (0x00007f68b59f6000)
libcublas.so.10.0 => /usr/local/cuda/lib64/libcublas.so.10.0 (0x00007f68b1460000)
libcusolver.so.10.0 => /usr/local/cuda/lib64/libcusolver.so.10.0 (0x00007f68a8d79000)
libcurand.so.10.0 => /usr/local/cuda/lib64/libcurand.so.10.0 (0x00007f68a4c12000)
libnvrtc.so.10.0 => /usr/local/cuda/lib64/libnvrtc.so.10.0 (0x00007f68a35f6000)
libnvToolsExt.so.1 => /usr/local/cuda/lib64/libnvToolsExt.so.1 (0x00007f68a33ed000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f68a3064000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f68a2cc6000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f68a2aae000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f68a26bd000)
/lib64/ld-linux-x86-64.so.2 (0x00007f6905ee6000)
libgfortran.so.4 => /usr/lib/x86_64-linux-gnu/libgfortran.so.4 (0x00007f68a22de000)
libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1 (0x00007f68a20af000)
libjpeg.so.8 => /usr/lib/x86_64-linux-gnu/libjpeg.so.8 (0x00007f68a1e47000)
libpng16.so.16 => /usr/lib/x86_64-linux-gnu/libpng16.so.16 (0x00007f68a1c15000)
libtiff.so.5 => /usr/lib/x86_64-linux-gnu/libtiff.so.5 (0x00007f68a199e000)
libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f68a1781000)
libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0 (0x00007f68a1541000)
liblzma.so.5 => /lib/x86_64-linux-gnu/liblzma.so.5 (0x00007f68a131b000)
libjbig.so.0 => /usr/lib/x86_64-linux-gnu/libjbig.so.0 (0x00007f68a110d000)
@leezu I set some breakpoints. I am not sure if this is okay. By only building TVM:
>>> import tvm
> /home/ubuntu/anaconda3/lib/python3.7/ctypes/__init__.py(365)__init__()
-> self._handle = _dlopen(self._name, mode)
(Pdb) c
> /home/ubuntu/Documents/tvm/python/tvm/_ffi/base.py(51)_load_lib()
-> lib = ctypes.CDLL(lib_path[0], ctypes.RTLD_GLOBAL)
(Pdb) c
> /home/ubuntu/anaconda3/lib/python3.7/ctypes/__init__.py(365)__init__()
-> self._handle = _dlopen(self._name, mode)
(Pdb) _dlopen("libcuda.so")
94685554591904
@jinboci for the libmxnet.so
, it currently has a libcuda
dependency when compiled with nvrtc. This will be fixed eventually (https://github.com/apache/incubator-mxnet/issues/17858), but if it blocks the TVMOp tests, I suggest you simply disable nvrtc
feature in the tvmop builds. Then the dependency on libcuda.so.1
in libmxnet.so
will disappear.
You need to check if the error is due to libmxnet.so
or libtvm.so
. Once you have identified the cause, the next step is to look into fixing it.
@leezu in CI mxnet is built without nvrtc?
@yzhliu NVRTC is enabled by default and thus built by the CI unless disabled: https://github.com/apache/incubator-mxnet/blob/497bf7efb403a9174817f07ab3d2f9be033845ad/CMakeLists.txt#L82
If libmxnet's dependency is causing the issue, we can just disable this flag in the TVMOp builds, until libmxnet.so is fixed. Based on the error logs posted in this issue, I'm not sure though if the error is due to libtvm or libmxnet