xformers icon indicating copy to clipboard operation
xformers copied to clipboard

Pip installation failing with 'command '/usr/local/cuda/bin/nvcc' failed with exit code 255'

Open BenjaminIrwin opened this issue 1 year ago • 10 comments

🐛 Bug

Pip installation fails on Amazon EC2 Instance (Amazon Linux) with confusing error: command '/usr/local/cuda/bin/nvcc' failed with exit code 255

Command

To install, I do the following:

git clone https://github.com/facebookresearch/xformers.git
cd xformers
git submodule update --init --recursive
pip install -r requirements.txt
pip install -e -v .

The error occurs on pip install -e -v . Note that this process also takes a very long time (had to leave it overnight).

To Reproduce

Steps to reproduce the behavior:

See above. I am following these steps in my fork of the stable-diffusion-webui repo and in accordance with these instructions.

Please find stack trace below, which I see after running pip install -e -v .:

:522: internal compiler error: in maybe_undo_parenthesized_ref, at cp/semantics.c:1739
         BOOL_SWITCH(launch_params.is_dropout, IsDropoutConst, [&] {
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              ^
    Please submit a full bug report,
    with preprocessed source if appropriate.
    See <http://bugzilla.redhat.com/bugzilla> for instructions.
    Preprocessed source stored into /tmp/ccU5sQOR.out file, please attach this to your bugreport.
    error: command '/usr/local/cuda/bin/nvcc' failed with exit code 255
    error: subprocess-exited-with-error
    
    × python setup.py develop did not run successfully.
    │ exit code: 1
    ╰─> See above for output.
    
    note: This error originates from a subprocess, and is likely not a problem with pip.
    full command: /home/ec2-user/ls-stable-diffusion/venv/bin/python3.9 -c '
    exec(compile('"'"''"'"''"'"'
    # This is <pip-setuptools-caller> -- a caller that pip uses to run setup.py
    #
    # - It imports setuptools before invoking setup.py, to enable projects that directly
    #   import from `distutils.core` to work with newer packaging standards.
    # - It provides a clear error message when setuptools is not installed.
    # - It sets `sys.argv[0]` to the underlying `setup.py`, when invoking `setup.py` so
    #   setuptools doesn'"'"'t think the script is `-c`. This avoids the following warning:
    #     manifest_maker: standard file '"'"'-c'"'"' not found".
    # - It generates a shim setup.py, for handling setup.cfg-only projects.
    import os, sys, tokenize
    
    try:
        import setuptools
    except ImportError as error:
        print(
            "ERROR: Can not execute `setup.py` since setuptools is not available in "
            "the build environment.",
            file=sys.stderr,
        )
        sys.exit(1)
    
    __file__ = %r
    sys.argv[0] = __file__
    
    if os.path.exists(__file__):
        filename = __file__
        with tokenize.open(__file__) as f:
            setup_py_code = f.read()
    else:
        filename = "<auto-generated setuptools caller>"
        setup_py_code = "from setuptools import setup; setup()"
    
    exec(compile(setup_py_code, filename, "exec"))
    '"'"''"'"''"'"' % ('"'"'/home/ec2-user/ls-stable-diffusion/repositories/xformers/setup.py'"'"',), "<pip-setuptools-caller>", "exec"))' develop --no-deps
    cwd: /home/ec2-user/ls-stable-diffusion/repositories/xformers/
error: subprocess-exited-with-error

× python setup.py develop did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

Expected behavior

Expected successful installation.

Environment

PyTorch version: 1.12.1+cu116 Is debug build: False CUDA used to build PyTorch: 11.6 ROCM used to build PyTorch: N/A Installed via: pip Build command used: pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116

OS: Amazon Linux 2 (x86_64) GCC version: (GCC) 7.3.1 20180712 (Red Hat 7.3.1-15) Clang version: Could not collect CMake version: version 2.8.12.2 Libc version: glibc-2.26

Python version: 3.9.10 (main, Sep 20 2022, 12:57:09) [GCC 7.3.1 20180712 (Red Hat 7.3.1-15)] (64-bit runtime) Python platform: Linux-4.14.290-217.505.amzn2.x86_64-x86_64-with-glibc2.26 Is CUDA available: True CUDA runtime version: 11.6.124 GPU models and configuration: GPU 0: Tesla T4 Nvidia driver version: 510.73.08 cuDNN version: Could not collect HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True

Versions of relevant libraries: [pip3] mypy-extensions==0.4.3 [pip3] numpy==1.23.3 [pip3] pytorch-lightning==1.7.6 [pip3] torch==1.12.1+cu116 [pip3] torchaudio==0.12.1+cu116 [pip3] torchdiffeq==0.2.3 [pip3] torchmetrics==0.9.3 [pip3] torchvision==0.13.1+cu116 [conda] Could not collect

Additional context

Thank you.

BenjaminIrwin avatar Oct 17 '22 08:10 BenjaminIrwin

Hi,

~The error appears because you can't find a cuda compiler (nvcc) to compile xformers extensions.~

If it is possible to you, we have made available conda binaries, which allow you to install xformers with conda install -c "xformers/label/dev" xformers https://anaconda.org/xformers/xformers and should make it much easier to use, as as you noticed compiling our CUDA extensions take a long time

EDIT: looks like it was a compilation issue with nvcc, appearing in flashattention implementation if I'm not mistaken... What is your nvcc version?

fmassa avatar Oct 17 '22 20:10 fmassa

If this issue only appears on the build of flash attention, you can disable its build with:

XFORMERS_DISABLE_FLASH_ATTN=1 pip install -e -v .

cc @tridao are you aware of "internal compiler error" issues with nvcc to build Flash?

danthe3rd avatar Oct 18 '22 08:10 danthe3rd

I think i've seen it. I haven't figured out the cause, but I think it's some combination of gcc version and nvcc version.

tridao avatar Oct 18 '22 14:10 tridao

Still haven't resolved this issue. Hopefully this info provides more clarity:

GCC version:

(GCC) 7.3.1 20180712 (Red Hat 7.3.1-15)

Cuda version:

Cuda compilation tools, release 11.6, V11.6.124 Build cuda_11.6.r11.6/compiler.31057947_0

BenjaminIrwin avatar Oct 25 '22 10:10 BenjaminIrwin

Have you tried disabling Flash Attention as I suggested above?

danthe3rd avatar Oct 25 '22 10:10 danthe3rd

Have you tried disabling Flash Attention as I suggested above?

yes, now with gcc 11.3.0 and nvcc cuda_11.5.r11.5 and "XFORMERS_DISABLE_FLASH_ATTN=1 FORCE_CUDA=1 pip install --require-virtualenv git+https://github.com/facebookresearch/xformers.git@main#egg=xformers":

/venv/lib/python3.10/site-packages/torch/include/c10/core/SymInt.h(84): warning #68-D: integer \
conversion resulted in a change of sign

      /usr/include/c++/11/bits/std_function.h:435:145: error: parameter packs not expanded with ‘...’:
        435 |         function(_Functor&& __f)
            |                                                                                                                                                 ^
      /usr/include/c++/11/bits/std_function.h:435:145: note:         ‘_ArgTypes’
      /usr/include/c++/11/bits/std_function.h:530:146: error: parameter packs not expanded with ‘...’:
        530 |         operator=(_Functor&& __f)
            |                                                                                                                                                  ^
      /usr/include/c++/11/bits/std_function.h:530:146: note:         ‘_ArgTypes’
      error: command '/usr/bin/nvcc' failed with exit code 255
      [end of output]

genewitch avatar Oct 29 '22 08:10 genewitch

I don't recommend doing this*, however:

nvcc --version:

Cuda compilation tools, release 11.8, V11.8.89 Build cuda_11.8.r11.8/compiler.31833905_0

gcc --version:

gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0

successfully completes and transformers works:

XFORMERS_DISABLE_FLASH_ATTN=1 FORCE_CUDA=1 pip install --require-virtualenv git+https://github.com/facebookresearch/xformers.git@main#egg=xformers [...] Successfully built xformers Installing collected packages: xformers Successfully installed xformers-0.0.14.dev0

hope this helps someone!

[*] I had to apt purge ubuntu-desktop, xserver-xorg-nouveau, and then manually add and purge cuda packages until all of the versions matched. however, even though /etc/alternatives/cuda -> /use/local/cuda-11-8, which nvcc was blank. So i had to export PATH=$PATH:/usr/local/cuda/bin

there may have been other steps, the machine crashed on reboot once and i had to wait for the onsite to hit the reset switch.

genewitch avatar Oct 29 '22 22:10 genewitch

UnsatisfiableError: The following specifications were found to be incompatible with each other:

Output in format: Requested package -> Available versionsThe following specifications were found to be incompatible with your system:

  • feature:/linux-64::__glibc==2.36=0
  • feature:|@/linux-64::__glibc==2.36=0

Your installed version is: 2.36

gcc --version
gcc (Ubuntu 12.2.0-3ubuntu1) 12.2.0
Copyright (C) 2022 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE

Please help.

Limbicnation avatar Nov 22 '22 06:11 Limbicnation

The conda errors can be cryptic and unrelated to the actual problem. What is the pytorch/python version you are using? We only support pytorch 1.12.1/1.13 and python 3.8/3.9/3.10 at this point

danthe3rd avatar Nov 22 '22 10:11 danthe3rd

Hi @danthe3rd thanks for your reply! I solved the error by installing pytorch 1.12 and Cuda Toolkit 11.5 Successfully built xformers

Screenshot from 2022-11-22 21-25-58

Limbicnation avatar Nov 22 '22 20:11 Limbicnation

I think I've fixed the error "internal compiler error: in maybe_undo_parenthesized_ref" with this commit in the flash-attention repo.

tridao avatar Dec 13 '22 09:12 tridao

Thank you!

Limbicnation avatar Jan 26 '23 03:01 Limbicnation