fairseq
fairseq copied to clipboard
Apex installation won't work with fairseq installation due to pytorch version mismatch
What is your question?
I installed fairseq for mBART model then While installing apex I encountered the following error:
What have you tried?
- I have tried installing apex with a conda using
conda install -c conda-forge nvidia-apex
- Fairseq instruction:
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" \ --global-option="--deprecated_fused_adam" --global-option="--xentropy" \ --global-option="--fast_multihead_attn" ./
What's your environment?
- fairseq Version (e.g., 1.0 or master): fairseq 0.9.0
- PyTorch Version (e.g., 1.0): 1.6.0
- OS (e.g., Linux): CentOS Linux
- How you installed fairseq (
pip
, source): pip - Build command you used (if compiling from source):
- Python version: python3.7.7
- CUDA/cuDNN version: release 10.1, V10.1.243
I encounter the same question....how to solve...
I think the reason could be that your PyTorch was compiled with a different version of the CUDA toolkit from the current version. Could you try to reinstall PyTorch and see?
This issue has been automatically marked as stale. If this issue is still affecting you, please leave any comment (for example, "bump"), and we'll keep it open. We are sorry that we haven't been able to prioritize it yet. If you have any new additional information, please include it with your comment!
Just leaving a note for anyone else who may encounter issues installing apex and hoping that someone might be able to help me with the issue I am encountering.
Pip does not have the 11.6 CUDA toolkit support available yet for PyTorch. So changing pip install torch==1.10.1+cu113 torchvision==0.11.2+cu113 torchaudio==0.10.1+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html
from 113
to 116
will not work. To get PyTorch support for 11.6
, run conda install pytorch torchvision torchaudio cudatoolkit=11.6 -c pytorch -c conda-forge
.
However, continuing to try and install apex by running pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" --global-option="--deprecated_fused_adam" --global-option="--xentropy" --global-option="--fast_multihead_attn" ./
does not work.
This is where I get stuck. I am getting the following error message:
WARNING: Disabling all use of wheels due to the use of --build-option / --global-option / --install-option. Using pip 22.1.2 from C:\Users\user\anaconda3\envs\NLLB\lib\site-packages\pip (python 3.8)
Processing c:\users\user\apex
Running command python setup.py egg_info
Traceback (most recent call last):
File "<string>", line 2, in <module>
File "<pip-setuptools-caller>", line 34, in <module>
File "C:\Users\user\apex\setup.py", line 1, in <module>
import torch
File "C:\Users\user\anaconda3\envs\NLLB\lib\site-packages\torch\__init__.py", line 129, in <module>
raise err
OSError: [WinError 127] The specified procedure could not be found. Error loading "C:\Users\user\anaconda3\envs\NLLB\lib\site-packages\torch\lib\caffe2_detectron_ops_gpu.dll" or one of its dependencies.
error: subprocess-exited-with-error
× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> See above for output.
note: This error originates from a subprocess, and is likely not a problem with pip.
full command: 'C:\Users\user\anaconda3\envs\NLLB\python.exe' -c '
exec(compile('"'"''"'"''"'"'
# This is <pip-setuptools-caller> -- a caller that pip uses to run setup.py
#
# - It imports setuptools before invoking setup.py, to enable projects that directly
# import from `distutils.core` to work with newer packaging standards.
# - It provides a clear error message when setuptools is not installed.
# - It sets `sys.argv[0]` to the underlying `setup.py`, when invoking `setup.py` so
# setuptools doesn'"'"'t think the script is `-c`. This avoids the following warning:
# manifest_maker: standard file '"'"'-c'"'"' not found".
# - It generates a shim setup.py, for handling setup.cfg-only projects.
import os, sys, tokenize
try:
import setuptools
except ImportError as error:
print(
"ERROR: Can not execute `setup.py` since setuptools is not available in "
"the build environment.",
file=sys.stderr,
)
sys.exit(1)
__file__ = %r
sys.argv[0] = __file__
if os.path.exists(__file__):
filename = __file__
with tokenize.open(__file__) as f:
setup_py_code = f.read()
else:
filename = "<auto-generated setuptools caller>"
setup_py_code = "from setuptools import setup; setup()"
exec(compile(setup_py_code, filename, "exec"))
'"'"''"'"''"'"' % ('"'"'C:\\Users\\user\\apex\\setup.py'"'"',), "<pip-setuptools-caller>", "exec"))
' egg_info --egg-base 'C:\Users\user\AppData\Local\Temp\pip-pip-egg-info-78mf55v0'
cwd: C:\Users\user\apex\
Preparing metadata (setup.py) ... error
error: metadata-generation-failed
× Encountered error while generating package metadata.
╰─> See above for output.
note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
I've checked my folder and the file C:\Users\user\anaconda3\envs\NLLB\lib\site-packages\torch\lib\caffe2_detectron_ops_gpu.dll
is there. I've tried uninstalling and reinstalling PyTorch but I continue to get the same error message. And I don't have any other caffe2_detectron_ops.dll
file that could be causing issues. Does anyone have a suggestion as to what may be causing this error?
Here is a simple fix:
conda install cudatoolkit-dev=11.3 gxx=10.3 cuda-nvcc=11.3 -c conda-forge -c nvidia
https://github.com/gordicaleksa/Open-NLLB/blob/nllb_replication/INSTALL.md <- check it out here
Here is a simple fix:
conda install cudatoolkit-dev=11.3 gxx=10.3 cuda-nvcc=11.3 -c conda-forge -c nvidia
https://github.com/gordicaleksa/Open-NLLB/blob/nllb_replication/INSTALL.md <- check it out here
11.7 is a better choice, however, an old version cudatoolkit (and nvcc) will potentially cause other tricky problems.