mxnet Restoring TVMOp tests

Description

(Brief description on what this PR is about) Restoring TVMOp tests. #18204 #18526 #17840

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

[ ] The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
[ ] Changes are complete (i.e. I finished coding on this PR)
[ ] All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
[ ] Code is well-documented:
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
Check the API doc at https://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
[ ] To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

[ ] Feature1, tests, (and when applicable, API doc)
[ ] Feature2, tests, (and when applicable, API doc)

Comments

If this change is a backward incompatible change, why must this change be made.
Interesting edge cases to note here

Jun 12 '20 03:06 jinboci

Hey @jinboci , Thanks for submitting the PR All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:

To trigger all jobs: @mxnet-bot run ci [all]
To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [clang, centos-cpu, miscellaneous, sanity, windows-gpu, windows-cpu, unix-gpu, centos-gpu, unix-cpu, website, edge]

Note: Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin. All CI tests must pass before the PR can be merged.

Jun 12 '20 03:06 mxnet-bot

@mxnet-bot run ci [unix-cpu, unix-gpu]

Jun 12 '20 10:06 jinboci

Jenkins CI successfully triggered : [unix-gpu, unix-cpu]

Jun 12 '20 10:06 mxnet-bot

@mxnet-bot run ci [centos-cpu]

Jun 12 '20 17:06 jinboci

Jenkins CI successfully triggered : [centos-cpu]

Jun 12 '20 17:06 mxnet-bot

You need to investigate why libcuda is not found in the container. Previously there was a hack of putting /usr/local/cuda/compat on the path, but that may not be the correct solution. AFAIK libcuda will be provided by https://github.com/NVIDIA/nvidia-docker/ inside the container based on the host system libcuda, typically only on a host system with gpus.

Jun 12 '20 20:06 leezu

@leezu Just check whether my understanding is correct. libcuda.so exists on the hosts which build mxnet, while it does not exist on hosts which run the tests. libcudart.so exist on both hosts, is it correct?

Jun 18 '20 04:06 yzhliu

@yzhliu It should be the other way round. Let's open the CI Docker container: docker run -it mxnetci/build.ubuntu_gpu_cu102 /bin/bash and look at the shared libraries in /usr/local/cuda:

root@de49f0e1966c:/work/mxnet# find /usr/local/cuda-10.2 -name "*.so*"
/usr/local/cuda-10.2/compat/libnvidia-ptxjitcompiler.so.440.33.01
/usr/local/cuda-10.2/compat/libcuda.so
/usr/local/cuda-10.2/compat/libcuda.so.1
/usr/local/cuda-10.2/compat/libcuda.so.440.33.01
/usr/local/cuda-10.2/compat/libnvidia-fatbinaryloader.so.440.33.01
/usr/local/cuda-10.2/compat/libnvidia-ptxjitcompiler.so
/usr/local/cuda-10.2/compat/libnvidia-ptxjitcompiler.so.1
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudart.so.10.2
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudart.so.10.2.89
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcupti.so.10.2
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppim.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppc.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppicc.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcurand.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnpps.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppial.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libOpenCL.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvrtc.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppist.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcuinj64.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libOpenCL.so.1.1
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppig.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppidei.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcusolver.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libaccinj64.so.10.2.89
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppicom.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libaccinj64.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libOpenCL.so.1
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppif.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcufftw.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libaccinj64.so.10.2
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcusolverMg.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcuinj64.so.10.2
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcupti.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcusparse.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvgraph.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnppim.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnppc.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnppicc.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libcurand.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnpps.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnppial.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnvrtc.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnppist.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libcuda.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnvidia-ml.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnppig.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnppidei.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libcusolver.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnppicom.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnppif.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libcufftw.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libcusolverMg.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libcusparse.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnvgraph.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnvjpeg.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnppisu.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libnppitc.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libcufft.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvjpeg.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcuinj64.so.10.2.89
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudart.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvperf_target.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppisu.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppitc.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcupti.so.10.2.75
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvperf_host.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcufft.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvToolsExt.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppisu.so.10
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppist.so.10.2.1.89
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvjpeg.so.10.3.1.89
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppitc.so.10.2.1.89
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcusparse.so.10
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcurand.so.10.1.2.89
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvrtc.so.10.2.89
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppif.so.10.2.1.89
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnpps.so.10
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppc.so.10
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppial.so.10
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnpps.so.10.2.1.89
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppidei.so.10.2.1.89
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppc.so.10.2.1.89
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppicom.so.10
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvrtc-builtins.so.10.2
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvToolsExt.so.1
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcufftw.so.10
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcusolverMg.so.10.3.0.89
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcufftw.so.10.1.2.89
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppicc.so.10
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppicc.so.10.2.1.89
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcurand.so.10
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppicom.so.10.2.1.89
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcusolverMg.so.10
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppial.so.10.2.1.89
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppist.so.10
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcusparse.so.10.3.1.89
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvrtc-builtins.so.10.2.89
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcusolver.so.10
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvgraph.so.10
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppim.so.10
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvToolsExt.so.1.0.0
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvrtc-builtins.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcufft.so.10.1.2.89
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppidei.so.10
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcufft.so.10
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppig.so.10.2.1.89
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvjpeg.so.10
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvrtc.so.10.2
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppig.so.10
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcusolver.so.10.3.0.89
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppif.so.10
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppisu.so.10.2.1.89
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppitc.so.10
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnppim.so.10.2.1.89
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvgraph.so.10.2.89
/usr/local/cuda-10.2/nvvm/lib64/libnvvm.so.3.3.0
/usr/local/cuda-10.2/nvvm/lib64/libnvvm.so
/usr/local/cuda-10.2/nvvm/lib64/libnvvm.so.3
/usr/local/cuda-10.2/nvvmx/lib64/libnvvm.so.3.3.0
/usr/local/cuda-10.2/nvvmx/lib64/libnvvm.so
/usr/local/cuda-10.2/nvvmx/lib64/libnvvm.so.3
/usr/local/cuda-10.2/extras/Sanitizer/libsanitizer-public.so

Because we don't use the nvidia docker command to run the container, only stubs/libcuda.so is available. If we're on a host with GPUs, we can use docker run --gpus all -it mxnetci/build.ubuntu_gpu_cu102 /bin/bash and the libcuda.so from the host as well as the host GPUs will be available inside the container. But on a CPU host this just leads to

docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: nvml error: driver not loaded\\\\n\\\"\"": unknown.
ERRO[0000] error waiting for container: context canceled

The problem is that some part of the tvmop setup currenly requires libcuda.so to be available (it's listed as shared library dependency of some shared library that is opened). We need to check which library is introducing the dependency and consider how to fix it. Ideally there shouldn't be a dependency on libcuda.so as it's only available on GPU hosts.

You can also refer to https://github.com/NVIDIA/nvidia-container-toolkit/issues/185 for a little background. The problem with the compat/libcuda.so AFAIK is that it does not necessarily fit the driver version of the host system.

Jun 18 '20 05:06 leezu

@yzhliu @leezu Thank you for your suggestions. I tried to directly disable the linkage of libcuda.so with

diff --git a/cmake/modules/CUDA.cmake b/cmake/modules/CUDA.cmake
index 936bb681b..32d13de38 100644
--- a/cmake/modules/CUDA.cmake
+++ b/cmake/modules/CUDA.cmake
@@ -35,7 +35,7 @@ if(USE_CUDA)
 
   list(APPEND TVM_LINKER_LIBS ${CUDA_NVRTC_LIBRARY})
   list(APPEND TVM_RUNTIME_LINKER_LIBS ${CUDA_CUDART_LIBRARY})
-  list(APPEND TVM_RUNTIME_LINKER_LIBS ${CUDA_CUDA_LIBRARY})
+  #list(APPEND TVM_RUNTIME_LINKER_LIBS ${CUDA_CUDA_LIBRARY})
   list(APPEND TVM_RUNTIME_LINKER_LIBS ${CUDA_NVRTC_LIBRARY})
 
   if(USE_CUDNN)
diff --git a/cmake/util/FindCUDA.cmake b/cmake/util/FindCUDA.cmake
index f971c87f2..5e2118148 100644
--- a/cmake/util/FindCUDA.cmake
+++ b/cmake/util/FindCUDA.cmake
@@ -58,9 +58,9 @@ macro(find_cuda use_cuda)
   # additional libraries
   if(CUDA_FOUND)
     if(MSVC)
-      find_library(CUDA_CUDA_LIBRARY cuda
-        ${CUDA_TOOLKIT_ROOT_DIR}/lib/x64
-        ${CUDA_TOOLKIT_ROOT_DIR}/lib/Win32)
+      #find_library(CUDA_CUDA_LIBRARY cudart
+        #${CUDA_TOOLKIT_ROOT_DIR}/lib/x64
+        #${CUDA_TOOLKIT_ROOT_DIR}/lib/Win32)
       find_library(CUDA_NVRTC_LIBRARY nvrtc
         ${CUDA_TOOLKIT_ROOT_DIR}/lib/x64
         ${CUDA_TOOLKIT_ROOT_DIR}/lib/Win32)
@@ -74,13 +74,13 @@ macro(find_cuda use_cuda)
         ${CUDA_TOOLKIT_ROOT_DIR}/lib/x64
         ${CUDA_TOOLKIT_ROOT_DIR}/lib/Win32)
     else(MSVC)
-      find_library(_CUDA_CUDA_LIBRARY cuda
-        PATHS ${CUDA_TOOLKIT_ROOT_DIR}
-        PATH_SUFFIXES lib lib64 targets/x86_64-linux/lib targets/x86_64-linux/lib/stubs lib64/stubs
-        NO_DEFAULT_PATH)
-      if(_CUDA_CUDA_LIBRARY)
-        set(CUDA_CUDA_LIBRARY ${_CUDA_CUDA_LIBRARY})
-      endif()
+      #find_library(_CUDA_CUDA_LIBRARY cudart
+        #PATHS ${CUDA_TOOLKIT_ROOT_DIR}
+        #PATH_SUFFIXES lib lib64 targets/x86_64-linux/lib targets/x86_64-linux/lib/stubs lib64/stubs
+        #NO_DEFAULT_PATH)
+      #if(_CUDA_CUDA_LIBRARY)
+        #set(CUDA_CUDA_LIBRARY ${_CUDA_CUDA_LIBRARY})
+      #endif()
       find_library(CUDA_NVRTC_LIBRARY nvrtc
         PATHS ${CUDA_TOOLKIT_ROOT_DIR}
         PATH_SUFFIXES lib lib64 targets/x86_64-linux/lib targets/x86_64-linux/lib/stubs lib64/stubs lib/x86_64-linux-gnu

However, getting errors while building tvm:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ubuntu/Documents/tvm/python/tvm/__init__.py", line 25, in <module>
    from ._ffi.base import TVMError, __version__
  File "/home/ubuntu/Documents/tvm/python/tvm/_ffi/__init__.py", line 28, in <module>
    from .base import register_error
  File "/home/ubuntu/Documents/tvm/python/tvm/_ffi/base.py", line 62, in <module>
    _LIB, _LIB_NAME = _load_lib()
  File "/home/ubuntu/Documents/tvm/python/tvm/_ffi/base.py", line 50, in _load_lib
    lib = ctypes.CDLL(lib_path[0], ctypes.RTLD_GLOBAL)
  File "/home/ubuntu/anaconda3/lib/python3.7/ctypes/__init__.py", line 364, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /home/ubuntu/Documents/tvm/build/libtvm.so: undefined symbol: cuLaunchKernel

It seems that cuLaunchKernel is one function needed from libcuda.so (I am not sure if it is). How could we call this function without linking libcuda.so?

Jun 18 '20 16:06 jinboci

@jinboci would it be possible to dlopen libcuda at runtime?

Jun 18 '20 18:06 leezu

@leezu @yzhliu Hi, I am still unclear about:

Does the machine in CI that builds mxnet provide libcuda.so?
When USE_TVM_OP is OFF, does building mxnet require the dependencies on libcuda.so?

I compiled mxnet with USE_TVM_OP OFF and USE_CUDA USE_CUDNN ON, and got:

(base) ubuntu@ip-172-31-37-194:~/Documents/mxnet/build$ ldd libmxnet.so
        linux-vdso.so.1 (0x00007ffda2ae3000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f68de615000)
        libopenblas.so.0 => /usr/local/lib/libopenblas.so.0 (0x00007f68dd688000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f68dd480000)
        libomp.so => /home/ubuntu/Documents/mxnet/build/3rdparty/openmp/runtime/src/libomp.so (0x00007f68dd19a000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f68dcf7b000)
        libcudnn.so.7 => /usr/local/cuda/lib64/libcudnn.so.7 (0x00007f68c795c000)
        libcuda.so.1 => /usr/lib/x86_64-linux-gnu/libcuda.so.1 (0x00007f68c6774000)
        libnvidia-ml.so.1 => /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1 (0x00007f68c614e000)
        libnccl.so.2 => /usr/local/cuda/lib/libnccl.so.2 (0x00007f68bf6fa000)
        libopencv_imgcodecs.so.4.2 => /usr/local/lib/libopencv_imgcodecs.so.4.2 (0x00007f68bed0d000)
        libopencv_imgproc.so.4.2 => /usr/local/lib/libopencv_imgproc.so.4.2 (0x00007f68bd409000)
        libopencv_core.so.4.2 => /usr/local/lib/libopencv_core.so.4.2 (0x00007f68bc124000)
        libcudart.so.10.0 => /usr/local/cuda/lib64/libcudart.so.10.0 (0x00007f68bbeaa000)
        libcufft.so.10.0 => /usr/local/cuda/lib64/libcufft.so.10.0 (0x00007f68b59f6000)
        libcublas.so.10.0 => /usr/local/cuda/lib64/libcublas.so.10.0 (0x00007f68b1460000)
        libcusolver.so.10.0 => /usr/local/cuda/lib64/libcusolver.so.10.0 (0x00007f68a8d79000)
        libcurand.so.10.0 => /usr/local/cuda/lib64/libcurand.so.10.0 (0x00007f68a4c12000)
        libnvrtc.so.10.0 => /usr/local/cuda/lib64/libnvrtc.so.10.0 (0x00007f68a35f6000)
        libnvToolsExt.so.1 => /usr/local/cuda/lib64/libnvToolsExt.so.1 (0x00007f68a33ed000)
        libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f68a3064000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f68a2cc6000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f68a2aae000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f68a26bd000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f6905ee6000)
        libgfortran.so.4 => /usr/lib/x86_64-linux-gnu/libgfortran.so.4 (0x00007f68a22de000)
        libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1 (0x00007f68a20af000)
        libjpeg.so.8 => /usr/lib/x86_64-linux-gnu/libjpeg.so.8 (0x00007f68a1e47000)
        libpng16.so.16 => /usr/lib/x86_64-linux-gnu/libpng16.so.16 (0x00007f68a1c15000)
        libtiff.so.5 => /usr/lib/x86_64-linux-gnu/libtiff.so.5 (0x00007f68a199e000)
        libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f68a1781000)
        libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0 (0x00007f68a1541000)
        liblzma.so.5 => /lib/x86_64-linux-gnu/liblzma.so.5 (0x00007f68a131b000)
        libjbig.so.0 => /usr/lib/x86_64-linux-gnu/libjbig.so.0 (0x00007f68a110d000)

Jun 18 '20 18:06 jinboci

@leezu I set some breakpoints. I am not sure if this is okay. By only building TVM:

>>> import tvm
> /home/ubuntu/anaconda3/lib/python3.7/ctypes/__init__.py(365)__init__()
-> self._handle = _dlopen(self._name, mode)
(Pdb) c
> /home/ubuntu/Documents/tvm/python/tvm/_ffi/base.py(51)_load_lib()
-> lib = ctypes.CDLL(lib_path[0], ctypes.RTLD_GLOBAL)
(Pdb) c
> /home/ubuntu/anaconda3/lib/python3.7/ctypes/__init__.py(365)__init__()
-> self._handle = _dlopen(self._name, mode)
(Pdb) _dlopen("libcuda.so")
94685554591904

Jun 18 '20 18:06 jinboci

@jinboci for the libmxnet.so, it currently has a libcuda dependency when compiled with nvrtc. This will be fixed eventually (https://github.com/apache/incubator-mxnet/issues/17858), but if it blocks the TVMOp tests, I suggest you simply disable nvrtc feature in the tvmop builds. Then the dependency on libcuda.so.1 in libmxnet.so will disappear.

You need to check if the error is due to libmxnet.so or libtvm.so. Once you have identified the cause, the next step is to look into fixing it.

Jun 18 '20 20:06 leezu

@leezu in CI mxnet is built without nvrtc?

Jun 19 '20 00:06 yzhliu

@yzhliu NVRTC is enabled by default and thus built by the CI unless disabled: https://github.com/apache/incubator-mxnet/blob/497bf7efb403a9174817f07ab3d2f9be033845ad/CMakeLists.txt#L82

If libmxnet's dependency is causing the issue, we can just disable this flag in the TVMOp builds, until libmxnet.so is fixed. Based on the error logs posted in this issue, I'm not sure though if the error is due to libtvm or libmxnet

Jun 19 '20 05:06 leezu

mxnet mxnet copied to clipboard

Restoring TVMOp tests

Description

Checklist

Essentials

Changes

Comments

mxnet
mxnet copied to clipboard