FasterTransformer icon indicating copy to clipboard operation
FasterTransformer copied to clipboard

Facing /build/lib/libth_longformer.so: undefined symbol: _ZNK2at6Tensor7is_cudaEv while running longformer

Open HemantTiwariGitHub opened this issue 3 years ago • 5 comments

Description

Facing /build/lib/libth_longformer.so: undefined symbol: _ZNK2at6Tensor7is_cudaEv, while running longformer. 

Faster Transformer Version: 5.1
Branch: Main 
GPU: A100 40GB
OS: Ubuntu 18.04
NVIDIA: 470.82.01, CUDA Version: 11.4

Reproduced Steps

Followed the Longformer guide v5.1 and ran :
python3 examples/pytorch/longformer/longformer_qa.py \
    --ft-longformer-lib build/lib/libth_longformer.so \
    --model-dir examples/pytorch/longformer/longformer-large-4096-finetuned-triviaqa \
    --passage "Jim Henson was a nice puppet" \
    --question "Who was Jim Henson?" \
    --repeat-test-num 50

This resulted in:
Traceback (most recent call last):
  File "FasterTransformer/examples/pytorch/longformer/longformer_qa.py", line 235, in <module>
    main()
  File "FasterTransformer/examples/pytorch/longformer/longformer_qa.py", line 169, in main
    ft_longformer = build_ft_longformer(model_dir, layer_num, head_num, size_per_head,
  File "FasterTransformer/examples/pytorch/longformer/longformer_qa.py", line 35, in build_ft_longformer
    ft_encoder = FTLongformerEncoder(weights_file, layer_num, head_num, size_per_head,
  File "FasterTransformer/examples/pytorch/longformer/model.py", line 73, in __init__
    torch.classes.load_library(ft_longformer_lib)
  File "/home/.conda/envs/FTLongformer/lib/python3.10/site-packages/torch/_classes.py", line 48, in load_library
    torch.ops.load_library(path)
  File "/home/.conda/envs/FTLongformer/lib/python3.10/site-packages/torch/_ops.py", line 255, in load_library
    ctypes.CDLL(path)
  File "/home/.conda/envs/FTLongformer/lib/python3.10/ctypes/__init__.py", line 374, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: FasterTransformer/build/lib/libth_longformer.so: undefined symbol: _ZNK2at6Tensor7is_cudaEv

HemantTiwariGitHub avatar Aug 21 '22 16:08 HemantTiwariGitHub

Can you provide the information about docker image?

The error is caused by

at::Tensor::is_cuda() const

This should be caused by pytorch version mismatched.

byshiue avatar Aug 21 '22 23:08 byshiue

I am using 1.12.1+cu102 as in docs it was mentioned "PyTorch: Verify on 1.8.0, >= 1.5.0 should work." I will drop to 1.8.0 and check.

HemantTiwariGitHub avatar Aug 22 '22 14:08 HemantTiwariGitHub

I am using 1.12.1+cu102 as in docs it was mentioned "PyTorch: Verify on 1.8.0, >= 1.5.0 should work." I will drop to 1.8.0 and check.

Can you try the docker image suggested in document?

byshiue avatar Aug 23 '22 00:08 byshiue

Hi sorry, I am not planning to use the docker image. Can a workaround/fix be suggested?

HemantTiwariGitHub avatar Aug 23 '22 11:08 HemantTiwariGitHub

The program cannot find the symbol of at::Tensor::is_cuda() const, this is a function of pytorch library.

But we don't know the reason for this issue and we don't have solution for it.

byshiue avatar Aug 23 '22 23:08 byshiue