FBGEMM
FBGEMM copied to clipboard
Failure in import: undefined symbol error from Python 3.7 + CUDA113
I'm using Python 3.7 and CUDA113.
I tried fbgemm-gpu
and fbgemm-gpu-nightly
from pip, both versions failed in import:
[root@/ml-code/data/michelangelo/examples/torchrec_example/test #]pip3 show fbgemm-gpu
Name: fbgemm-gpu
Version: 0.1.2
Summary: UNKNOWN
Home-page: https://github.com/pytorch/fbgemm
Author: FBGEMM Team
Author-email: [email protected]
License: BSD-3
Location: /usr/lib/python3.7/site-packages
Requires:
Required-by:
>>> import fbgemm_gpu
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.7/site-packages/fbgemm_gpu/__init__.py", line 12, in <module>
torch.ops.load_library(os.path.join(os.path.dirname(__file__), "fbgemm_gpu_py.so"))
File "/usr/lib/python3.7/site-packages/torch/_ops.py", line 220, in load_library
ctypes.CDLL(path)
File "/usr/lib/python3.7/ctypes/__init__.py", line 364, in __init__
self._handle = _dlopen(self._name, mode)
OSError: /usr/lib/python3.7/site-packages/fbgemm_gpu/fbgemm_gpu_py.so: undefined symbol: _ZN2at6detail20computeStorageNbytesEN3c108ArrayRefIlEES3_mm
FBGEMM has a dependency on PyTorch installation. Have you installed PyTorch already?
import torch import fbgemm_gpu
@jianyuh I installed torch 1.11 + cu113 already.
Details:
>>> torch.__version__
'1.11.0+cu113'
>>> torch.cuda.is_available()
True
>>> import fbgemm_gpu
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.7/site-packages/fbgemm_gpu/__init__.py", line 12, in <module>
torch.ops.load_library(os.path.join(os.path.dirname(__file__), "fbgemm_gpu_py.so"))
File "/usr/lib/python3.7/site-packages/torch/_ops.py", line 220, in load_library
ctypes.CDLL(path)
File "/usr/lib/python3.7/ctypes/__init__.py", line 364, in __init__
self._handle = _dlopen(self._name, mode)
OSError: /usr/lib/python3.7/site-packages/fbgemm_gpu/fbgemm_gpu_py.so: undefined symbol: _ZN2at6detail20computeStorageNbytesEN3c108ArrayRefIlEES3_mm
I tried with installing from src guide on homepage. The build can pass, for example:
[ 98%] Building CXX object bench/CMakeFiles/I8SpmdmBenchmark.dir/I8SpmdmBenchmark.cc.o
[ 99%] Building CXX object bench/CMakeFiles/I8SpmdmBenchmark.dir/BenchUtils.cc.o
[ 99%] Building CXX object bench/CMakeFiles/I8SpmdmBenchmark.dir/__/test/QuantizationHelpers.cc.o
[ 99%] Building CXX object bench/CMakeFiles/I8SpmdmBenchmark.dir/__/test/EmbeddingSpMDMTestUtils.cc.o
[ 99%] Linking CXX executable I8SpmdmBenchmark
make[2]: Leaving directory '/root/FBGEMM/build'
[ 99%] Built target I8SpmdmBenchmark
make[2]: Entering directory '/root/FBGEMM/build'
Scanning dependencies of target PackedFloatInOutBenchmark
make[2]: Leaving directory '/root/FBGEMM/build'
make[2]: Entering directory '/root/FBGEMM/build'
[ 99%] Building CXX object bench/CMakeFiles/PackedFloatInOutBenchmark.dir/PackedFloatInOutBenchmark.cc.o
[ 99%] Building CXX object bench/CMakeFiles/PackedFloatInOutBenchmark.dir/BenchUtils.cc.o
[100%] Building CXX object bench/CMakeFiles/PackedFloatInOutBenchmark.dir/__/test/QuantizationHelpers.cc.o
[100%] Building CXX object bench/CMakeFiles/PackedFloatInOutBenchmark.dir/__/test/EmbeddingSpMDMTestUtils.cc.o
[100%] Linking CXX executable PackedFloatInOutBenchmark
make[2]: Leaving directory '/root/FBGEMM/build'
[100%] Built target PackedFloatInOutBenchmark
make[1]: Leaving directory '/root/FBGEMM/build'
make: Leaving directory '/root/FBGEMM/build'
But still get same import errors. By checking ldd of fbgemm_gpu_py.so
, some symbol links are missing:
[root@~/FBGEMM #]ldd /usr/lib/python3.7/site-packages/fbgemm_gpu/fbgemm_gpu_py.so
linux-vdso.so.1 (0x00007ffdbaebb000)
libtorch.so => not found
libc10.so => not found
libcuda.so.1 => /usr/local/nvidia/lib64/libcuda.so.1 (0x00007fce523ab000)
libnvrtc.so.11.2 => /usr/local/cuda/lib64/libnvrtc.so.11.2 (0x00007fce4f676000)
libnvToolsExt.so.1 => /usr/local/cuda/lib64/libnvToolsExt.so.1 (0x00007fce4f46b000)
libcudart.so.11.0 => /usr/local/cuda/lib64/libcudart.so.11.0 (0x00007fce4f1d2000)
libc10_cuda.so => not found
libnvidia-ml.so.1 => /usr/local/nvidia/lib64/libnvidia-ml.so.1 (0x00007fce4eb3a000)
libtorch_cuda.so => not found
libtorch_cuda_cpp.so => not found
libtorch_cpu.so => not found
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fce4eb17000)
libcublas.so.11 => /usr/local/cuda/lib64/libcublas.so.11 (0x00007fce47ad7000)
libtorch_cuda_cu.so => not found
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fce47acd000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fce47949000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fce477c4000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fce477aa000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fce475ea000)
/lib64/ld-linux-x86-64.so.2 (0x00007fce6f32a000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fce475e5000)
libcublasLt.so.11 => /usr/local/cuda/lib64/libcublasLt.so.11 (0x00007fce3ba07000)
GPU: RTX5000. Driver Version: 470.63.01
Hi @chongxiaoc, thanks for sharing! It is a known issue we are working on! As a workaround please build fbgemm_gpu from source.
@chongxiaoc is this resolved for you?
@colin2328 I think we can close it for now. Obviously RTX5000 compute capability 7.5 is not supported from low level.
Closing this issue, as FBGEMM_GPU builds have substantially changed over the last few months. Please feel free to file a new issue if you run into installation issues.