fairseq icon indicating copy to clipboard operation
fairseq copied to clipboard

Apex installation won't work with fairseq installation due to pytorch version mismatch

Open kaushal0494 opened this issue 4 years ago • 6 comments

What is your question?

I installed fairseq for mBART model then While installing apex I encountered the following error: error1 error2 error

What have you tried?

  1. I have tried installing apex with a conda using conda install -c conda-forge nvidia-apex
  2. Fairseq instruction: pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" \ --global-option="--deprecated_fused_adam" --global-option="--xentropy" \ --global-option="--fast_multihead_attn" ./

What's your environment?

  • fairseq Version (e.g., 1.0 or master): fairseq 0.9.0
  • PyTorch Version (e.g., 1.0): 1.6.0
  • OS (e.g., Linux): CentOS Linux
  • How you installed fairseq (pip, source): pip
  • Build command you used (if compiling from source):
  • Python version: python3.7.7
  • CUDA/cuDNN version: release 10.1, V10.1.243

kaushal0494 avatar Jul 29 '20 09:07 kaushal0494

I encounter the same question....how to solve...

ttzHome avatar Sep 23 '20 06:09 ttzHome

I think the reason could be that your PyTorch was compiled with a different version of the CUDA toolkit from the current version. Could you try to reinstall PyTorch and see?

shawnlimn avatar Mar 11 '21 09:03 shawnlimn

This issue has been automatically marked as stale. If this issue is still affecting you, please leave any comment (for example, "bump"), and we'll keep it open. We are sorry that we haven't been able to prioritize it yet. If you have any new additional information, please include it with your comment!

stale[bot] avatar Jun 16 '21 23:06 stale[bot]

Just leaving a note for anyone else who may encounter issues installing apex and hoping that someone might be able to help me with the issue I am encountering.


Pip does not have the 11.6 CUDA toolkit support available yet for PyTorch. So changing pip install torch==1.10.1+cu113 torchvision==0.11.2+cu113 torchaudio==0.10.1+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html from 113 to 116 will not work. To get PyTorch support for 11.6, run conda install pytorch torchvision torchaudio cudatoolkit=11.6 -c pytorch -c conda-forge.

However, continuing to try and install apex by running pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" --global-option="--deprecated_fused_adam" --global-option="--xentropy" --global-option="--fast_multihead_attn" ./ does not work.

This is where I get stuck. I am getting the following error message:

WARNING: Disabling all use of wheels due to the use of --build-option / --global-option / --install-option.                                                                                           Using pip 22.1.2 from C:\Users\user\anaconda3\envs\NLLB\lib\site-packages\pip (python 3.8)           
Processing c:\users\user\apex 
  Running command python setup.py egg_info
  Traceback (most recent call last):
    File "<string>", line 2, in <module>
    File "<pip-setuptools-caller>", line 34, in <module>
    File "C:\Users\user\apex\setup.py", line 1, in <module>
      import torch
    File "C:\Users\user\anaconda3\envs\NLLB\lib\site-packages\torch\__init__.py", line 129, in <module>                                                  
      raise err 
  OSError: [WinError 127] The specified procedure could not be found. Error loading "C:\Users\user\anaconda3\envs\NLLB\lib\site-packages\torch\lib\caffe2_detectron_ops_gpu.dll" or one of its dependencies. 
  error: subprocess-exited-with-error

  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> See above for output.

  note: This error originates from a subprocess, and is likely not a problem with pip.
  full command: 'C:\Users\user\anaconda3\envs\NLLB\python.exe' -c '
  exec(compile('"'"''"'"''"'"'
  # This is <pip-setuptools-caller> -- a caller that pip uses to run setup.py
  # 
  # - It imports setuptools before invoking setup.py, to enable projects that directly
  #   import from `distutils.core` to work with newer packaging standards.
  # - It provides a clear error message when setuptools is not installed.
  # - It sets `sys.argv[0]` to the underlying `setup.py`, when invoking `setup.py` so 
  #   setuptools doesn'"'"'t think the script is `-c`. This avoids the following warning:
  #     manifest_maker: standard file '"'"'-c'"'"' not found".
  # - It generates a shim setup.py, for handling setup.cfg-only projects.
  import os, sys, tokenize    

  try:
      import setuptools
  except ImportError as error:
      print(
             "ERROR: Can not execute `setup.py` since setuptools is not available in "
             "the build environment.",
             file=sys.stderr,
      )
      sys.exit(1)

  __file__ = %r
  sys.argv[0] = __file__

  if os.path.exists(__file__):
      filename = __file__ 
      with tokenize.open(__file__) as f: 
          setup_py_code = f.read()
  else:
      filename = "<auto-generated setuptools caller>"
      setup_py_code = "from setuptools import setup; setup()"

  exec(compile(setup_py_code, filename, "exec"))
  '"'"''"'"''"'"' % ('"'"'C:\\Users\\user\\apex\\setup.py'"'"',), "<pip-setuptools-caller>", "exec"))
' egg_info --egg-base 'C:\Users\user\AppData\Local\Temp\pip-pip-egg-info-78mf55v0'
  cwd: C:\Users\user\apex\
  Preparing metadata (setup.py) ... error
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

I've checked my folder and the file C:\Users\user\anaconda3\envs\NLLB\lib\site-packages\torch\lib\caffe2_detectron_ops_gpu.dll is there. I've tried uninstalling and reinstalling PyTorch but I continue to get the same error message. And I don't have any other caffe2_detectron_ops.dll file that could be causing issues. Does anyone have a suggestion as to what may be causing this error?

egoetz avatar Aug 08 '22 21:08 egoetz

Here is a simple fix: conda install cudatoolkit-dev=11.3 gxx=10.3 cuda-nvcc=11.3 -c conda-forge -c nvidia

https://github.com/gordicaleksa/Open-NLLB/blob/nllb_replication/INSTALL.md <- check it out here

gordicaleksa avatar Sep 04 '23 12:09 gordicaleksa

Here is a simple fix: conda install cudatoolkit-dev=11.3 gxx=10.3 cuda-nvcc=11.3 -c conda-forge -c nvidia

https://github.com/gordicaleksa/Open-NLLB/blob/nllb_replication/INSTALL.md <- check it out here

11.7 is a better choice, however, an old version cudatoolkit (and nvcc) will potentially cause other tricky problems.

zhiqu22 avatar Apr 07 '24 15:04 zhiqu22