apex icon indicating copy to clipboard operation
apex copied to clipboard

Can't build apex on Windows 10, Visual Studio 2019 (alignment issue)

Open gordicaleksa opened this issue 3 years ago • 1 comments

After creating a conda environment by following the instructions from BigScience here, i.e.:

  1. conda create -n bloom python=3.9
  2. conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch (my system-wide CUDA is 11.3 as well)

I'm hitting an error installing apex.

Again per the instructions linked above I"m using this instruction to install apex: pip install --global-option="--cpp_ext" --global-option="--cuda_ext" --no-cache -v --disable-pip-version-check . 2>&1 | tee build.log

Here is the error:

C:/Program Files (x86)/Microsoft Visual Studio/2019/Community/VC/Tools/MSVC/14.24.28314/include\type_traits(1061): error: static assertion failed with "You've instantiated std::aligned_storage<Len, Align> with an extended alignment (in other words, Align > alignof(max_align_t)). Before VS 2017 15.8, the member "type" would non-conformingly have an alignment of only alignof(max_align_t). VS 2017 15.8 was fixed to handle this correctly, but the fix inherently changes layout and breaks binary compatibility (*only* for uses of aligned_storage with extended alignments). Please define either (1) _ENABLE_EXTENDED_ALIGNED_STORAGE to acknowledge that you understand this message and that you actually want a type with an extended alignment, or (2) _DISABLE_EXTENDED_ALIGNED_STORAGE to silence this message and get the old non-conforming behavior."
            detected during:
              instantiation of class "std::_Aligned<_Len, _Align, double, false> [with _Len=16ULL, _Align=16ULL]"
  (1079): here
              instantiation of class "std::_Aligned<_Len, _Align, int, false> [with _Len=16ULL, _Align=16ULL]"
  (1084): here
              instantiation of class "std::_Aligned<_Len, _Align, short, false> [with _Len=16ULL, _Align=16ULL]"
  (1089): here
              instantiation of class "std::_Aligned<_Len, _Align, char, false> [with _Len=16ULL, _Align=16ULL]"
  (1094): here
              instantiation of class "std::aligned_storage<_Len, _Align> [with _Len=16ULL, _Align=16ULL]"
  T:\YouTube_Code\7_BLOOM\apex\csrc\multi_tensor_axpby_kernel.cu(23): here
              instantiation of "void load_store(T *, T *, int, int) [with T=float]"
  T:\YouTube_Code\7_BLOOM\apex\csrc\multi_tensor_axpby_kernel.cu(68): here
              instantiation of "void AxpbyFunctor<x_t, y_t, out_t>::operator()(int, volatile int *, TensorListMetadata<3> &, float, float, int) [with x_t=float, y_t=float, out_t=float]"
  T:\YouTube_Code\7_BLOOM\apex\csrc\multi_tensor_apply.cuh(38): here
              instantiation of "void multi_tensor_apply_kernel(int, volatile int *, T, U, ArgTypes...) [with T=TensorListMetadata<3>, U=AxpbyFunctor<float, float, float>, ArgTypes=<float, float, int>]"
  T:\YouTube_Code\7_BLOOM\apex\csrc\multi_tensor_apply.cuh(109): here
              instantiation of "void multi_tensor_apply<depth,T,ArgTypes...>(int, int, const at::Tensor &, const std::vector<std::vector<at::Tensor, std::allocator<at::Tensor>>, std::allocator<std::vector<at::Tensor, std::allocator<at::Tensor>>>> &, T, ArgTypes...) [with depth=3, T=AxpbyFunctor<float, float, float>, ArgTypes=<float, float, int>]"
  T:\YouTube_Code\7_BLOOM\apex\csrc\multi_tensor_axpby_kernel.cu(141): here

  1 error detected in the compilation of "csrc/multi_tensor_axpby_kernel.cu".
  error: command 'C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v11.7\\bin\\nvcc.exe' failed with exit code 4294967295
  error: subprocess-exited-with-error

  Running setup.py install for apex did not run successfully.
  exit code: 1

  See above for output.

  note: This error originates from a subprocess, and is likely not a problem with pip.
  full command: 'C:\Users\aleks\Miniconda3\envs\bloom\python.exe' -u -c '
  exec(compile('"'"''"'"''"'"'
  # This is <pip-setuptools-caller> -- a caller that pip uses to run setup.py
  #
  # - It imports setuptools before invoking setup.py, to enable projects that directly
  #   import from `distutils.core` to work with newer packaging standards.
  # - It provides a clear error message when setuptools is not installed.
  # - It sets `sys.argv[0]` to the underlying `setup.py`, when invoking `setup.py` so
  #   setuptools doesn'"'"'t think the script is `-c`. This avoids the following warning:
  #     manifest_maker: standard file '"'"'-c'"'"' not found".
  # - It generates a shim setup.py, for handling setup.cfg-only projects.
  import os, sys, tokenize

  try:
      import setuptools
  except ImportError as error:
      print(
          "ERROR: Can not execute `setup.py` since setuptools is not available in "
          "the build environment.",
          file=sys.stderr,
      )
      sys.exit(1)

  __file__ = %r
  sys.argv[0] = __file__

  if os.path.exists(__file__):
      filename = __file__
      with tokenize.open(__file__) as f:
          setup_py_code = f.read()
  else:
      filename = "<auto-generated setuptools caller>"
      setup_py_code = "from setuptools import setup; setup()"

  exec(compile(setup_py_code, filename, "exec"))
  '"'"''"'"''"'"' % ('"'"'T:\\YouTube_Code\\7_BLOOM\\apex\\setup.py'"'"',), "<pip-setuptools-caller>", "exec"))' --cpp_ext --cuda_ext install --record 'C:\Users\aleks\AppData\Local\Temp\pip-record-12qb6t0s\install-record.txt' --single-version-externally-managed --compile --install-headers 'C:\Users\aleks\Miniconda3\envs\bloom\Include\apex'
  cwd: T:\YouTube_Code\7_BLOOM\apex\
  Running setup.py install for apex: finished with status 'error'
error: legacy-install-failure

Encountered error while trying to install package.

apex

note: This is an issue with the package mentioned above, not pip.
hint: See above for output from the failure.

My environment: Windows 10 Python 3.9 CUDA 11.3 PyTorch 1.12.1

gordicaleksa avatar Aug 16 '22 08:08 gordicaleksa

Hi, gordicaleksa. I meet the same problem when I install apex in Windows 10. Have you solved the problem? Thank you for any suggestions.

Solved problem by https://github.com/NVIDIA/apex/issues/835#issuecomment-647054393

letian-zhang avatar Oct 15 '22 01:10 letian-zhang