xformers
xformers copied to clipboard
Pip installation failing with 'command '/usr/local/cuda/bin/nvcc' failed with exit code 255'
🐛 Bug
Pip installation fails on Amazon EC2 Instance (Amazon Linux) with confusing error: command '/usr/local/cuda/bin/nvcc' failed with exit code 255
Command
To install, I do the following:
git clone https://github.com/facebookresearch/xformers.git
cd xformers
git submodule update --init --recursive
pip install -r requirements.txt
pip install -e -v .
The error occurs on pip install -e -v .
Note that this process also takes a very long time (had to leave it overnight).
To Reproduce
Steps to reproduce the behavior:
See above. I am following these steps in my fork of the stable-diffusion-webui repo and in accordance with these instructions.
Please find stack trace below, which I see after running pip install -e -v .
:
:522: internal compiler error: in maybe_undo_parenthesized_ref, at cp/semantics.c:1739
BOOL_SWITCH(launch_params.is_dropout, IsDropoutConst, [&] {
^
Please submit a full bug report,
with preprocessed source if appropriate.
See <http://bugzilla.redhat.com/bugzilla> for instructions.
Preprocessed source stored into /tmp/ccU5sQOR.out file, please attach this to your bugreport.
error: command '/usr/local/cuda/bin/nvcc' failed with exit code 255
error: subprocess-exited-with-error
× python setup.py develop did not run successfully.
│ exit code: 1
╰─> See above for output.
note: This error originates from a subprocess, and is likely not a problem with pip.
full command: /home/ec2-user/ls-stable-diffusion/venv/bin/python3.9 -c '
exec(compile('"'"''"'"''"'"'
# This is <pip-setuptools-caller> -- a caller that pip uses to run setup.py
#
# - It imports setuptools before invoking setup.py, to enable projects that directly
# import from `distutils.core` to work with newer packaging standards.
# - It provides a clear error message when setuptools is not installed.
# - It sets `sys.argv[0]` to the underlying `setup.py`, when invoking `setup.py` so
# setuptools doesn'"'"'t think the script is `-c`. This avoids the following warning:
# manifest_maker: standard file '"'"'-c'"'"' not found".
# - It generates a shim setup.py, for handling setup.cfg-only projects.
import os, sys, tokenize
try:
import setuptools
except ImportError as error:
print(
"ERROR: Can not execute `setup.py` since setuptools is not available in "
"the build environment.",
file=sys.stderr,
)
sys.exit(1)
__file__ = %r
sys.argv[0] = __file__
if os.path.exists(__file__):
filename = __file__
with tokenize.open(__file__) as f:
setup_py_code = f.read()
else:
filename = "<auto-generated setuptools caller>"
setup_py_code = "from setuptools import setup; setup()"
exec(compile(setup_py_code, filename, "exec"))
'"'"''"'"''"'"' % ('"'"'/home/ec2-user/ls-stable-diffusion/repositories/xformers/setup.py'"'"',), "<pip-setuptools-caller>", "exec"))' develop --no-deps
cwd: /home/ec2-user/ls-stable-diffusion/repositories/xformers/
error: subprocess-exited-with-error
× python setup.py develop did not run successfully.
│ exit code: 1
╰─> See above for output.
note: This error originates from a subprocess, and is likely not a problem with pip.
Expected behavior
Expected successful installation.
Environment
PyTorch version: 1.12.1+cu116 Is debug build: False CUDA used to build PyTorch: 11.6 ROCM used to build PyTorch: N/A Installed via: pip Build command used: pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116
OS: Amazon Linux 2 (x86_64) GCC version: (GCC) 7.3.1 20180712 (Red Hat 7.3.1-15) Clang version: Could not collect CMake version: version 2.8.12.2 Libc version: glibc-2.26
Python version: 3.9.10 (main, Sep 20 2022, 12:57:09) [GCC 7.3.1 20180712 (Red Hat 7.3.1-15)] (64-bit runtime) Python platform: Linux-4.14.290-217.505.amzn2.x86_64-x86_64-with-glibc2.26 Is CUDA available: True CUDA runtime version: 11.6.124 GPU models and configuration: GPU 0: Tesla T4 Nvidia driver version: 510.73.08 cuDNN version: Could not collect HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True
Versions of relevant libraries: [pip3] mypy-extensions==0.4.3 [pip3] numpy==1.23.3 [pip3] pytorch-lightning==1.7.6 [pip3] torch==1.12.1+cu116 [pip3] torchaudio==0.12.1+cu116 [pip3] torchdiffeq==0.2.3 [pip3] torchmetrics==0.9.3 [pip3] torchvision==0.13.1+cu116 [conda] Could not collect
Additional context
Thank you.
Hi,
~The error appears because you can't find a cuda compiler (nvcc) to compile xformers extensions.~
If it is possible to you, we have made available conda binaries, which allow you to install xformers with conda install -c "xformers/label/dev" xformers
https://anaconda.org/xformers/xformers and should make it much easier to use, as as you noticed compiling our CUDA extensions take a long time
EDIT: looks like it was a compilation issue with nvcc, appearing in flashattention implementation if I'm not mistaken... What is your nvcc version?
If this issue only appears on the build of flash attention, you can disable its build with:
XFORMERS_DISABLE_FLASH_ATTN=1 pip install -e -v .
cc @tridao are you aware of "internal compiler error
" issues with nvcc to build Flash?
I think i've seen it. I haven't figured out the cause, but I think it's some combination of gcc version and nvcc version.
Still haven't resolved this issue. Hopefully this info provides more clarity:
GCC version:
(GCC) 7.3.1 20180712 (Red Hat 7.3.1-15)
Cuda version:
Cuda compilation tools, release 11.6, V11.6.124 Build cuda_11.6.r11.6/compiler.31057947_0
Have you tried disabling Flash Attention as I suggested above?
Have you tried disabling Flash Attention as I suggested above?
yes, now with gcc 11.3.0 and nvcc cuda_11.5.r11.5 and "XFORMERS_DISABLE_FLASH_ATTN=1 FORCE_CUDA=1 pip install --require-virtualenv git+https://github.com/facebookresearch/xformers.git@main#egg=xformers":
/venv/lib/python3.10/site-packages/torch/include/c10/core/SymInt.h(84): warning #68-D: integer \
conversion resulted in a change of sign
/usr/include/c++/11/bits/std_function.h:435:145: error: parameter packs not expanded with ‘...’:
435 | function(_Functor&& __f)
| ^
/usr/include/c++/11/bits/std_function.h:435:145: note: ‘_ArgTypes’
/usr/include/c++/11/bits/std_function.h:530:146: error: parameter packs not expanded with ‘...’:
530 | operator=(_Functor&& __f)
| ^
/usr/include/c++/11/bits/std_function.h:530:146: note: ‘_ArgTypes’
error: command '/usr/bin/nvcc' failed with exit code 255
[end of output]
I don't recommend doing this*, however:
nvcc --version:
Cuda compilation tools, release 11.8, V11.8.89 Build cuda_11.8.r11.8/compiler.31833905_0
gcc --version:
gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
successfully completes and transformers works:
XFORMERS_DISABLE_FLASH_ATTN=1 FORCE_CUDA=1 pip install --require-virtualenv git+https://github.com/facebookresearch/xformers.git@main#egg=xformers [...] Successfully built xformers Installing collected packages: xformers Successfully installed xformers-0.0.14.dev0
hope this helps someone!
[*] I had to apt purge ubuntu-desktop, xserver-xorg-nouveau, and then manually add and purge cuda packages until all of the versions matched. however, even though /etc/alternatives/cuda -> /use/local/cuda-11-8, which nvcc
was blank. So i had to export PATH=$PATH:/usr/local/cuda/bin
there may have been other steps, the machine crashed on reboot once and i had to wait for the onsite to hit the reset switch.
UnsatisfiableError: The following specifications were found to be incompatible with each other:
Output in format: Requested package -> Available versionsThe following specifications were found to be incompatible with your system:
- feature:/linux-64::__glibc==2.36=0
- feature:|@/linux-64::__glibc==2.36=0
Your installed version is: 2.36
gcc --version
gcc (Ubuntu 12.2.0-3ubuntu1) 12.2.0
Copyright (C) 2022 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE
Please help.
The conda errors can be cryptic and unrelated to the actual problem. What is the pytorch/python version you are using? We only support pytorch 1.12.1/1.13 and python 3.8/3.9/3.10 at this point
Hi @danthe3rd thanks for your reply! I solved the error by installing pytorch 1.12 and Cuda Toolkit 11.5
Successfully built xformers
I think I've fixed the error "internal compiler error: in maybe_undo_parenthesized_ref" with this commit in the flash-attention repo.
Thank you!