extension-cpp icon indicating copy to clipboard operation
extension-cpp copied to clipboard

undefined symbol: THPVariableClass

Open jatentaki opened this issue 6 years ago • 14 comments

  • OS: Ubuntu 16.04
  • PyTorch version: 0.4.0
  • How you installed PyTorch (conda, pip, source): conda
  • Python version: 3.6
  • CUDA/cuDNN version: 9.0
  • GPU models and configuration: Titan XP

I built an extension basing on this tutorial and it used to work. I was then doing some refactoring and fixes (in cuda/cpp code) and afterwards it started failing at runtime:

/home/jatentaki/anaconda3/lib/python3.6/site-packages/sort2_cuda-0.0.0-py3.6-linux-x86_64.egg/lltm_cpp.cpython-36m-x86_64-linux-gnu.so: undefined symbol: THPVariableClass

(both for CUDA and cpp versions). Then I tried if the original example still worked, and to my surprise, no longer.

Timeline:

  • My initial success was on some 0.4.0 pre-release source build for cuda8.0.
  • I broke it
  • Trying to troubleshoot, I reinstalled conda and torch for the release 0.4.0 version, with cuda9.0
  • Neither my code nor your original example work

I believe the error just means I am not linking against some static library, but I don't see when and how I could have introduced that change.

jatentaki avatar May 01 '18 02:05 jatentaki

This often occurs when you import the extension before import torch. Are you sure the order you are importing is:

import torch
import your_extension

Also, does this error occur when you import torch or import your_extension? Or does it fail when compiling the extension?

goldsborough avatar May 01 '18 02:05 goldsborough

Ok, I won't be able to test on the same machine before tomorrow, but the fix works on my personal laptop. Perhaps this should be mentioned in the tutorial? Maybe it's common setuptools knowledge, but it caught me off guard.

jatentaki avatar May 01 '18 03:05 jatentaki

It says it in the tutorial -- there is a line saying

Just be sure to import torch first, as this will resolve some symbols that the dynamic linker must see

It doesn't have anything to do with setuptools, it's just a dynamic linking issue. The torch module is a shared (dynamic) library which defines certain symbols that are unresolved in the extension library. To make these symbols available, the library containing the symbols (torch) must be imported before the library using them (your_extension) so that the dynamic linker can match the symbols with those from the torch library.

goldsborough avatar May 01 '18 04:05 goldsborough

I helped another user who made the same mistake. Maybe we can figure out a good way to give a better error message.

ezyang avatar Sep 06 '18 14:09 ezyang

@ezyang I'll think of something

goldsborough avatar Sep 06 '18 15:09 goldsborough

Having a similar error, and loading torch before the extension doesn't solve it. Here's the error stack:- Version Info:

Pytorch version: 0.4.1
CUDA version: 8.0
GCC version: 5.2.0

Error stack:-

>>> import torch
>>> import modules
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/data/graphics/toyota-pytorch/inplace_abn/modules/__init__.py", line 2, in <module>
    from .bn import ABN, InPlaceABN, InPlaceABNSync
  File "/data/graphics/toyota-pytorch/inplace_abn/modules/bn.py", line 10, in <module>
    from .functions import *
  File "/data/graphics/toyota-pytorch/inplace_abn/modules/functions.py", line 17, in <module>
    extra_cuda_cflags=["--expt-extended-lambda"])
  File "/afs/csail.mit.edu/u/s/smadan/miniconda3/envs/test/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 494, in load
    with_cuda=with_cuda)
  File "/afs/csail.mit.edu/u/s/smadan/miniconda3/envs/test/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 670, in _jit_compile
    return _import_module_from_library(name, build_directory)
  File "/afs/csail.mit.edu/u/s/smadan/miniconda3/envs/test/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 753, in _import_module_from_library
    return imp.load_module(module_name, file, path, description)
  File "/afs/csail.mit.edu/u/s/smadan/miniconda3/envs/test/lib/python3.6/imp.py", line 243, in load_module
    return load_dynamic(name, filename, file)
  File "/afs/csail.mit.edu/u/s/smadan/miniconda3/envs/test/lib/python3.6/imp.py", line 343, in load_dynamic
    return _load(spec)
ImportError: /tmp/torch_extensions/inplace_abn/inplace_abn.so: undefined symbol: _ZN2at5ErrorC1ENS_14SourceLocationENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
>>> 

Code base which I'm trying to run when the error occurs:- https://github.com/mapillary/inplace_abn

Any leads on what I should try?

Spandan-Madan avatar Sep 26 '18 06:09 Spandan-Madan

@Spandan-Madan this is basically flaking on an ABI incompatibility. (gcc > 5.1 binaries have different std::string ABI than gccc <= 5.1 binaries).

For this, we (pytorch) have a patch in 0.4.1 that sets a flag to compile the cpp-extension with _GLIBCXX_USE_CXX11_ABI=0 (see https://github.com/pytorch/pytorch/commit/f08f222db3b23e925754ee29c882cec0c7da461e ).

Did you build the extension with pytorch-master and switch back to pytorch-0.4.1 (or something of that sort)?

soumith avatar Sep 27 '18 03:09 soumith

Thanks for the reply @soumith.

I am using an extension present in the folder modules here in this repo: https://github.com/mapillary/inplace_abn

I installed Pytorch using conda (both normal and your channel), but I get this error in both.

Any leads on what I should try would be helpful. I've tried running with GCC 4.8 and 5.2 both, error persists.

Thanks in advance :)

Spandan-Madan avatar Sep 27 '18 18:09 Spandan-Madan

@Spandan-Madan I have the same problem, and I solved it by switch gcc to 4.9.2 Hope this will help you.

gmyofustc avatar Oct 05 '18 03:10 gmyofustc

@Spandan-Madan Hi, have you solved the problem yet? Met the same problem and tired gcc 4.8, gcc 4.9 and gcc 5.4 but all failed. The error still exists My pytorch is 0.4.1.

ChujunWhu avatar Mar 25 '19 03:03 ChujunWhu

This often occurs when you import the extension before import torch. Are you sure the order you are importing is:

import torch
import your_extension

Also, does this error occur when you import torch or import your_extension? Or does it fail when compiling the extension?

hello, i meet the same problem, and i import torch before import _C, but it also occur, could you help me?

etoilestar avatar Jun 07 '20 10:06 etoilestar

I suspect the underlying error is https://github.com/pytorch/pytorch/issues/38122.

heiner avatar Jul 20 '20 19:07 heiner

Could you please check https://github.com/daniilidis-group/neural_renderer/issues/92 and https://github.com/daniilidis-group/neural_renderer/issues/93

I was able to reproduce this error for two repos.

$ python
Python 3.7.6 (default, Jan  8 2020, 19:59:22) 
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.__version__
'1.6.0'
>>> torch.version.cuda
'10.1'
>>> torch.cuda.is_available()
True


$ gcc --version
gcc (Ubuntu 9.3.0-10ubuntu2) 9.3.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.


$ lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 20.04.1 LTS
Release:	20.04
Codename:	focal



monajalal avatar Sep 23 '20 22:09 monajalal

@goldsborough

here is the code I am trying to run:


"""
Example 1. Drawing a teapot from multiple viewpoints.
"""
import os
import argparse

import torch
import numpy as np
import tqdm
import imageio

import neural_renderer as nr

not sure it throws this error

(base) mona@mona:~/research/3danimals/neural_renderer/examples$ python example1.py 
Traceback (most recent call last):
  File "example1.py", line 12, in <module>
    import neural_renderer as nr
  File "/home/mona/anaconda3/lib/python3.7/site-packages/neural_renderer/__init__.py", line 3, in <module>
    from .load_obj import load_obj
  File "/home/mona/anaconda3/lib/python3.7/site-packages/neural_renderer/load_obj.py", line 8, in <module>
    import neural_renderer.cuda.load_textures as load_textures_cuda
ImportError: /home/mona/anaconda3/lib/python3.7/site-packages/neural_renderer/cuda/load_textures.cpython-37m-x86_64-linux-gnu.so: undefined symbol: THPVariableClass

https://github.com/daniilidis-group/neural_renderer/issues/93

monajalal avatar Sep 23 '20 23:09 monajalal