audio icon indicating copy to clipboard operation
audio copied to clipboard

Build failure for v0.10.2 in nvidia/cuda:11.3.1-cudnn8-devel-ubuntu20.04

Open Lokiiiiii opened this issue 2 years ago • 17 comments

Repro Instructions:

git clone --recursive https://github.com/pytorch/audio.git
cd audio
git checkout v0.10.2
export USE_CUDA=1
export BUILD_SOX=1
CC=gcc-9 CXX=g++-9 python3.8 setup.py bdist_wheel
python3.8 -m pip install dist/*.whl
python3.8 -c 'import torchaudio'

Issue

The following warnings are displayed on import

/audio/torchaudio/_extension.py:11: UserWarning: torchaudio C++ extension is not availabl$
.
  warnings.warn('torchaudio C++ extension is not available.')
/audio/torchaudio/backend/utils.py:67: UserWarning: No audio backend is available.
  warnings.warn('No audio backend is available.')

Logs

Build log

Lokiiiiii avatar Mar 23 '22 18:03 Lokiiiiii

Hi @Lokiiiiii

Can you try python3.8 -c 'import torchaudio' outside of the cloned directory and see if the warning is gone?

The build log seems to be fine. I do not see any issue/failure.

Since the installation command is pip install dist/*.whl, the resulting binary will go into install location like ../site-packages/torchaudio/.... However, when you do python3.8 -c 'import torchaudio' in the cloned repo, the source directory will shadow the installed one, and import source directory, which does not have the built extension.

mthrok avatar Mar 23 '22 19:03 mthrok

Thanks, however I am still seeing an error:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/opt/conda/lib/python3.8/site-packages/torchaudio/__init__.py", line 1, in <module>
    from torchaudio import _extension  # noqa: F401
  File "/opt/conda/lib/python3.8/site-packages/torchaudio/_extension.py", line 27, in <module>
    _init_extension()
  File "/opt/conda/lib/python3.8/site-packages/torchaudio/_extension.py", line 21, in _init_extension
    torch.ops.load_library(path)
  File "/opt/conda/lib/python3.8/site-packages/torch/_ops.py", line 110, in load_library
    ctypes.CDLL(path)
  File "/opt/conda/lib/python3.8/ctypes/__init__.py", line 373, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /opt/conda/lib/python3.8/site-packages/torchaudio/lib/libtorchaudio.so: undefined symbol: _ZNK5torch8autograd4Node4nameB5cxx11Ev

Lokiiiiii avatar Mar 23 '22 19:03 Lokiiiiii

I see couple of issues.

OSError: /opt/conda/lib/python3.8/site-packages/torchaudio/lib/libtorchaudio.so: undefined symbol: _ZNK5torch8autograd4Node4nameB5cxx11Ev

In general, this error happens when PyTorch binary and torchaudio binary do not match. torchaudio needs to use the matching extension module (the module written in C++ and compiled).

However, looking at the path where the issue happens, torchaudio/lib/libtorchaudio.so indicates this is version 0.11. The version you built and installed is version 0.10.2 and it is supposed to have torchaudio/_torchaudio.so. (The build log you pointed also confirms this.)

So I suggest to uninstall all the torchaudio you have in your env (repeat pip uninstall torchaudio and conda uninstall torchaudio), then make sure you have the version of PyTorch you want to use, (and make sure there is only one version in the env), then try building torchaudio again.

mthrok avatar Mar 23 '22 19:03 mthrok

I just did a complete build from source on a fresh environment. I built torch, torch_xla, torchvision and finally torchaudio all from source. Still getting the same error:

Environment

root@1c84aa84490d:/# pip list | grep torch
torch                       1.10.2
torch-xla                   1.10.0
torchaudio                  0.10.1+6f539cf
torchvision                 0.11.0a0+05eae32

Error

root@1c84aa84490d:/# python -c 'import torch, torch_xla, torchvision, torchaudio'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/opt/conda/lib/python3.8/site-packages/torchaudio/__init__.py", line 1, in <module>
    from torchaudio import _extension  # noqa: F401
  File "/opt/conda/lib/python3.8/site-packages/torchaudio/_extension.py", line 27, in <module>
    _init_extension()
  File "/opt/conda/lib/python3.8/site-packages/torchaudio/_extension.py", line 21, in _init_extension
    torch.ops.load_library(path)
  File "/opt/conda/lib/python3.8/site-packages/torch/_ops.py", line 110, in load_library
    ctypes.CDLL(path)
  File "/opt/conda/lib/python3.8/ctypes/__init__.py", line 373, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /opt/conda/lib/python3.8/site-packages/torchaudio/lib/libtorchaudio.so: undefined symbol: _ZNK5torch8autograd4Node4nameB5cxx11Ev

Lokiiiiii avatar Mar 23 '22 23:03 Lokiiiiii

Hmm, all the cases of this kinds of error I have seen so are basically the mismatch of the PyTorch version. If your environment has only one version of PyTorch then it does not explain.

Something interesting about the error message is it has cxx11Ev at the end. It might be about ABI mismatch. (There is an interesting report about it https://github.com/linkinpark213/linkinpark213.github.io/issues/12#issuecomment-456251560)

My next hypothesis is that PyTorch and torchaudio are compiled with different ABI settings. (The different values of _GLIBCXX_USE_CXX11_ABI or could be different compiler) although I would expect torchaudio compilation to fail in that case.

If you run the following command, what do you get? It tries to find a symbol with torch8autograd4Node4name from PyTorch C++ library files.

nm `python -c 'import torch;print("/".join(torch.__file__.split("/")[:-1]))'`/lib/libtorch* | grep torch8autograd4Node4name

mthrok avatar Mar 24 '22 00:03 mthrok

root@1c84aa84490d:/# nm `python -c 'import torch;print("/".join(torch.__file__.split("/")[:-1]))'`/lib/libtorch* | grep torch8autograd4Node4name
0000000003fbe140 T _ZNK5torch8autograd4Node4nameEv
                 U _ZNK5torch8autograd4Node4nameEv

Will do a fresh build of torchaudio explicitly setting _GLIBCXX_USE_CXX11_ABI=0 I am building both torch and torchaudio with gcc-9.

Lokiiiiii avatar Mar 24 '22 01:03 Lokiiiiii

Issue persists after

CFLAGS="${CFLAGS} -D_GLIBCXX_USE_CXX11_ABI=0" CXXFLAGS="${CXXFLAGS} -D_GLIBCXX_USE_CXX11_ABI=0" CC="gcc-9" CXX="g++-9" BUILD_SOX=1 python setup.py bdist_wheel

Lokiiiiii avatar Mar 24 '22 02:03 Lokiiiiii

root@1c84aa84490d:/# nm `python -c 'import torch;print("/".join(torch.__file__.split("/")[:-1]))'`/lib/libtorch* | grep torch8autograd4Node4name
0000000003fbe140 T _ZNK5torch8autograd4Node4nameEv
                 U _ZNK5torch8autograd4Node4nameEv

Will do a fresh build of torchaudio explicitly setting _GLIBCXX_USE_CXX11_ABI=0 I am building both torch and torchaudio with gcc-9.

Okay, at least we know that the torchaudio is expecting the PyTorch library compiled with cxx11 ABI, which is why it's failing.

Issue persists after

CFLAGS="${CFLAGS} -D_GLIBCXX_USE_CXX11_ABI=0" CXXFLAGS="${CXXFLAGS} -D_GLIBCXX_USE_CXX11_ABI=0" CC="gcc-9" CXX="g++-9" BUILD_SOX=1 python setup.py bdist_wheel

Can you check the build log and see if the value of _GLIBCXX_USE_CXX11_ABI is reflected? The thing is that torchaudio's build process should be detecting the configuration that PyTorch was compiled.

https://github.com/pytorch/audio/blob/6f539cf3edc4224b51798e962ca28519e5479ffb/CMakeLists.txt#L125-L126

So it should not be necessary to set the flag manually. However, this part is not well-tested, so this might be bug.

mthrok avatar Mar 24 '22 02:03 mthrok

Build logs @ https://pastebin.com/d4X6WCrn indicate _GLIBCXX_USE_CXX11_ABI=0 is reflected.

Lokiiiiii avatar Mar 24 '22 03:03 Lokiiiiii

hmm, if that's the case, I do not have an idea what is causing the error. As a reference, I tried compiling the v0.10.2 on nvidia/cuda:11.3.1-cudnn8-devel-ubuntu20.04 and it worked fine.

https://gist.github.com/mthrok/de45e817a1b5f9475bcdac0cee464de6

mthrok avatar Mar 24 '22 14:03 mthrok

I can repro your build when using a pre-built torch binary. But the build starts failing when I install torch from source.

First error I face is

gcc -DHAVE_CONFIG_H  -I. -I/audio/build/temp.linux-x86_64-3.8/third_party/sox/src/lame/fro
ntend -I.. -I/audio/build/temp.linux-x86_64-3.8/third_party/sox/src/lame/libmp3lame -I/aud
io/build/temp.linux-x86_64-3.8/third_party/sox/src/lame/include -I..    -Wall -pipe -I/aud
io/third_party/sox/../install/include -fvisibility=hidden  -D_GLIBCXX_USE_CXX11_ABI=0
-c /audio/build/temp.linux-x86_64-3.8/third_party/sox/src/lame/frontend/console.c
/bin/bash: /root/anaconda3/envs/prod/lib/libtinfo.so.6: no version information available (
required by /bin/bash)
/audio/build/temp.linux-x86_64-3.8/third_party/sox/src/lame/frontend/console.c:25:11: fata
l error: curses.h: No such file or directory
   25 | # include <curses.h>
      |           ^~~~~~~~~~
compilation terminated. 
make[2]: *** [Makefile:396: console.o] Error 1
make[2]: Leaving directory '/audio/build/temp.linux-x86_64-3.8/third_party/sox/src/lame-bu
ild/frontend'
make[1]: *** [Makefile:349: all-recursive] Error 1
make[1]: Leaving directory '/audio/build/temp.linux-x86_64-3.8/third_party/sox/src/lame-bu
ild'

I can get past that with apt-get install libncurses-dev. Currently working on

libtool: link: gcc -Wall -pipe -I/audio/third_party/sox/../install/include -fvisibility=hi
dden -D_GLIBCXX_USE_CXX11_ABI=0 -o lame lame_main.o main.o brhist.o console.o get_audio.o
lametime.o parse.o timestatus.o  -L/audio/third_party/sox/../install/lib ../libmp3lame/.li
bs/libmp3lame.a -lncurses -lm
/usr/bin/ld: console.o: in function `get_termcap_string':
console.c:(.text+0x102): undefined reference to `tgetstr'
/usr/bin/ld: console.o: in function `get_termcap_number':
console.c:(.text+0x183): undefined reference to `tgetnum'
/usr/bin/ld: console.o: in function `apply_termcap_settings':
console.c:(.text+0x20a): undefined reference to `tgetent'
collect2: error: ld returned 1 exit status
make[2]: *** [Makefile:358: lame] Error 1
make[2]: Leaving directory '/audio/build/temp.linux-x86_64-3.8/third_party/sox/src/lame-bu
ild/frontend'
make[1]: *** [Makefile:349: all-recursive] Error 1
make[1]: Leaving directory '/audio/build/temp.linux-x86_64-3.8/third_party/sox/src/lame-bu
ild'
make: *** [Makefile:276: all] Error 2

CMake Error at /audio/build/temp.linux-x86_64-3.8/third_party/sox/src/lame-stamp/lame-buil
d-Release.cmake:47 (message):
  Stopping after outputting logs.

Lokiiiiii avatar Mar 24 '22 18:03 Lokiiiiii

Repro Instructions

conda create -y --name py38 python=3.8 anaconda
conda activate py38
conda install -y numpy pyyaml mkl-include setuptools cmake cffi typing tqdm coverage tensorboard hypothesis dataclasses
export CFLAGS="${CFLAGS} -D_GLIBCXX_USE_CXX11_ABI=0"
export CXXFLAGS="${CXXFLAGS} -D_GLIBCXX_USE_CXX11_ABI=0"
export USE_CUDA=1

git clone https://github.com/pytorch/pytorch.git
pushd pytorch && git checkout v1.10.2
git submodule update --init --recursive
sed -i 's/set(CUDA_PROPAGATE_HOST_FLAGS OFF)//g' third_party/gloo/cmake/Cuda.cmake
USE_SYSTEM_NCCL=1 python setup.py install

git clone https://github.com/pytorch/audio.git
pip install ninja
pushd audio && git checkout v0.10.2
BUILD_SOX=1 python setup.py install

Lokiiiiii avatar Mar 24 '22 22:03 Lokiiiiii

Can confirm this only happens when D_GLIBCXX_USE_CXX11_ABI=0

Lokiiiiii avatar Mar 25 '22 17:03 Lokiiiiii

Hi @Lokiiiiii

Sorry for the late reply.

export CFLAGS="${CFLAGS} -D_GLIBCXX_USE_CXX11_ABI=0"
export CXXFLAGS="${CXXFLAGS} -D_GLIBCXX_USE_CXX11_ABI=0"

This does not seem to be the proper way to enable CXX11 ABI in PyTorch. I think the proper way is to set _GLIBCXX_USE_CXX11_ABI environment variable.

https://github.com/pytorch/pytorch/blob/38a758e25178d70362c1a2d900e9f7c27e70af28/tools/setup_helpers/cmake.py#L243-L252

TorchAudio fetches the information of CXX11 ABI via TORCH_CXX_FLAGS variable in CMake. Which is propagated from the _GLIBCXX_USE_CXX11_ABI environment variable.

Can you try building PyTorch with setting the _GLIBCXX_USE_CXX11_ABI environment variable instead of manipulating CFLAGS and CXXFLAGS?

mthrok avatar Apr 08 '22 05:04 mthrok

Can we do something like https://github.com/pytorch/FBGEMM/blob/0e24712210b44a3adf3832f9f9bfb1e486d81f4f/fbgemm_gpu/setup.py#L50 when building the torchaudio binary? fwiw, pytorch nightly is built with GLIBCXX_USE_CXX11_ABI=0. See more discussion at https://github.com/pytorch/pytorch/pull/100262#issuecomment-1542140819

desertfire avatar May 10 '23 18:05 desertfire

Can we do something like https://github.com/pytorch/FBGEMM/blob/0e24712210b44a3adf3832f9f9bfb1e486d81f4f/fbgemm_gpu/setup.py#L50 when building the torchaudio binary? fwiw, pytorch nightly is built with GLIBCXX_USE_CXX11_ABI=0. See more discussion at pytorch/pytorch#100262 (comment)

This line was covering that, but I think it's no longer working I guess. https://github.com/pytorch/audio/blob/4463fbdfbbc29fbc78d5dcd4f61cd9d0a806432c/CMakeLists.txt#L130-L139

mthrok avatar May 10 '23 19:05 mthrok

This line was covering that, but I think it's no longer working I guess.

https://github.com/pytorch/audio/blob/4463fbdfbbc29fbc78d5dcd4f61cd9d0a806432c/CMakeLists.txt#L130-L139

Probably. torchtext also does this, https://github.com/pytorch/text/blob/b0ebddc648d279826089db91775375221777a2db/tools/setup_helpers/extension.py#LL25C37-L25C37

desertfire avatar May 10 '23 20:05 desertfire