Facing /build/lib/libth_longformer.so: undefined symbol: _ZNK2at6Tensor7is_cudaEv while running longformer
Description
Facing /build/lib/libth_longformer.so: undefined symbol: _ZNK2at6Tensor7is_cudaEv, while running longformer.
Faster Transformer Version: 5.1
Branch: Main
GPU: A100 40GB
OS: Ubuntu 18.04
NVIDIA: 470.82.01, CUDA Version: 11.4
Reproduced Steps
Followed the Longformer guide v5.1 and ran :
python3 examples/pytorch/longformer/longformer_qa.py \
--ft-longformer-lib build/lib/libth_longformer.so \
--model-dir examples/pytorch/longformer/longformer-large-4096-finetuned-triviaqa \
--passage "Jim Henson was a nice puppet" \
--question "Who was Jim Henson?" \
--repeat-test-num 50
This resulted in:
Traceback (most recent call last):
File "FasterTransformer/examples/pytorch/longformer/longformer_qa.py", line 235, in <module>
main()
File "FasterTransformer/examples/pytorch/longformer/longformer_qa.py", line 169, in main
ft_longformer = build_ft_longformer(model_dir, layer_num, head_num, size_per_head,
File "FasterTransformer/examples/pytorch/longformer/longformer_qa.py", line 35, in build_ft_longformer
ft_encoder = FTLongformerEncoder(weights_file, layer_num, head_num, size_per_head,
File "FasterTransformer/examples/pytorch/longformer/model.py", line 73, in __init__
torch.classes.load_library(ft_longformer_lib)
File "/home/.conda/envs/FTLongformer/lib/python3.10/site-packages/torch/_classes.py", line 48, in load_library
torch.ops.load_library(path)
File "/home/.conda/envs/FTLongformer/lib/python3.10/site-packages/torch/_ops.py", line 255, in load_library
ctypes.CDLL(path)
File "/home/.conda/envs/FTLongformer/lib/python3.10/ctypes/__init__.py", line 374, in __init__
self._handle = _dlopen(self._name, mode)
OSError: FasterTransformer/build/lib/libth_longformer.so: undefined symbol: _ZNK2at6Tensor7is_cudaEv
Can you provide the information about docker image?
The error is caused by
at::Tensor::is_cuda() const
This should be caused by pytorch version mismatched.
I am using 1.12.1+cu102 as in docs it was mentioned "PyTorch: Verify on 1.8.0, >= 1.5.0 should work." I will drop to 1.8.0 and check.
I am using 1.12.1+cu102 as in docs it was mentioned "PyTorch: Verify on 1.8.0, >= 1.5.0 should work." I will drop to 1.8.0 and check.
Can you try the docker image suggested in document?
Hi sorry, I am not planning to use the docker image. Can a workaround/fix be suggested?
The program cannot find the symbol of at::Tensor::is_cuda() const, this is a function of pytorch library.
But we don't know the reason for this issue and we don't have solution for it.