Open-Sora icon indicating copy to clipboard operation
Open-Sora copied to clipboard

Cannot install apex

Open dheera opened this issue 1 year ago • 4 comments
trafficstars

NVIDIA's apex has actual syntax errors -_____________- can someone point me to a version that actually complies? Thanks!

$ export CC=/usr/bin/gcc-12
$ export CXX=/usr/bin/g++-12
$ pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" git+https://github.com/NVIDIA/apex.git
...
...
  [2/2] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /tmp/pip-req-build-ekqfqf73/build/temp.linux-x86_64-cpython-310/csrc/mlp_cuda.o.d -I/home/dheera/miniconda3/envs/opensora/lib/python3.10/site-packages/torch/include -I/home/dheera/miniconda3/envs/opensora/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/home/dheera/miniconda3/envs/opensora/lib/python3.10/site-packages/torch/include/TH -I/home/dheera/miniconda3/envs/opensora/lib/python3.10/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/dheera/miniconda3/envs/opensora/include/python3.10 -c -c /tmp/pip-req-build-ekqfqf73/csrc/mlp_cuda.cu -o /tmp/pip-req-build-ekqfqf73/build/temp.linux-x86_64-cpython-310/csrc/mlp_cuda.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=mlp_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 -ccbin /usr/bin/gcc-12 -std=c++17
  FAILED: /tmp/pip-req-build-ekqfqf73/build/temp.linux-x86_64-cpython-310/csrc/mlp_cuda.o
  /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /tmp/pip-req-build-ekqfqf73/build/temp.linux-x86_64-cpython-310/csrc/mlp_cuda.o.d -I/home/dheera/miniconda3/envs/opensora/lib/python3.10/site-packages/torch/include -I/home/dheera/miniconda3/envs/opensora/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/home/dheera/miniconda3/envs/opensora/lib/python3.10/site-packages/torch/include/TH -I/home/dheera/miniconda3/envs/opensora/lib/python3.10/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/dheera/miniconda3/envs/opensora/include/python3.10 -c -c /tmp/pip-req-build-ekqfqf73/csrc/mlp_cuda.cu -o /tmp/pip-req-build-ekqfqf73/build/temp.linux-x86_64-cpython-310/csrc/mlp_cuda.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=mlp_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 -ccbin /usr/bin/gcc-12 -std=c++17
  /home/dheera/miniconda3/envs/opensora/lib/python3.10/site-packages/torch/include/pybind11/detail/../cast.h: In function ‘typename pybind11::detail::type_caster<typename pybind11::detail::intrinsic_type<T>::type>::cast_op_type<T> pybind11::detail::cast_op(make_caster<T>&)’:
  /home/dheera/miniconda3/envs/opensora/lib/python3.10/site-packages/torch/include/pybind11/detail/../cast.h:45:120: error: expected template-name before ‘<’ token
     45 |     return caster.operator typename make_caster<T>::template cast_op_type<T>();
        |                                                                                                                        ^
  /home/dheera/miniconda3/envs/opensora/lib/python3.10/site-packages/torch/include/pybind11/detail/../cast.h:45:120: error: expected identifier before ‘<’ token
  /home/dheera/miniconda3/envs/opensora/lib/python3.10/site-packages/torch/include/pybind11/detail/../cast.h:45:123: error: expected primary-expression before ‘>’ token
     45 |     return caster.operator typename make_caster<T>::template cast_op_type<T>();
        |                                                                                                                           ^
  /home/dheera/miniconda3/envs/opensora/lib/python3.10/site-packages/torch/include/pybind11/detail/../cast.h:45:126: error: expected primary-expression before ‘)’ token
     45 |     return caster.operator typename make_caster<T>::template cast_op_type<T>();

dheera avatar Apr 18 '24 17:04 dheera

i have the same issue, install another apex version just like the command as follow: git clone https://github.com/NVIDIA/apex.git cd apex pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./ but have failed to run the scripts "torchrun --standalone --nproc_per_node 1 scripts/inference.py configs/opensora/inference/16x512x512.py --ckpt-path OpenSora-v1-HQ-16x512x512.pth --prompt-path ./assets/texts/t2v_samples.txt" , the error show that as follow: "File "/gl_data/soft/miniconda3/envs/opensora/lib/python3.10/site-packages/apex/amp/_amp_state.py", line 14, in from torch._six import container_abcs ModuleNotFoundError: No module named 'torch._six'" so i think it is the reason that the apex version is not correspond to the pytorch verison. so i am very thankful that who can give me one solution or the corresponding apex version for torch2.2.2+cu118

wytyl13 avatar Apr 19 '24 02:04 wytyl13

i have the same issue, install another apex version just like the command as follow: git clone https://github.com/NVIDIA/apex.git cd apex pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./ but have failed to run the scripts "torchrun --standalone --nproc_per_node 1 scripts/inference.py configs/opensora/inference/16x512x512.py --ckpt-path OpenSora-v1-HQ-16x512x512.pth --prompt-path ./assets/texts/t2v_samples.txt" , the error show that as follow: "File "/gl_data/soft/miniconda3/envs/opensora/lib/python3.10/site-packages/apex/amp/_amp_state.py", line 14, in from torch._six import container_abcs ModuleNotFoundError: No module named 'torch._six'" so i think it is the reason that the apex version is not correspond to the pytorch verison. so i am very thankful that who can give me one solution or the corresponding apex version for torch2.2.2+cu118

i have found the error is because the torch version is high, https://github.com/NVIDIA/apex/pull/1049

wytyl13 avatar Apr 19 '24 03:04 wytyl13

i have the same issue, install another apex version just like the command as follow: git clone https://github.com/NVIDIA/apex.git cd apex pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./ but have failed to run the scripts "torchrun --standalone --nproc_per_node 1 scripts/inference.py configs/opensora/inference/16x512x512.py --ckpt-path OpenSora-v1-HQ-16x512x512.pth --prompt-path ./assets/texts/t2v_samples.txt" , the error show that as follow: "File "/gl_data/soft/miniconda3/envs/opensora/lib/python3.10/site-packages/apex/amp/_amp_state.py", line 14, in from torch._six import container_abcs ModuleNotFoundError: No module named 'torch._six'" so i think it is the reason that the apex version is not correspond to the pytorch verison. so i am very thankful that who can give me one solution or the corresponding apex version for torch2.2.2+cu118

i have found the error is because the torch version is high, NVIDIA/apex#1049

I have successful handle the problem, by using TORCH_MAJOR = int(torch.version.split('.')[0]) TORCH_MINOR = int(torch.version.split('.')[1]) if TORCH_MAJOR == 1 and TORCH_MINOR < 8: from torch._six import container_abcs else: import collections.abc as container_abcs to change from torch._six import string_classes in vim /gl_data/soft/miniconda3/envs/opensora/lib/python3.10/site-packages/apex/amp/_initialize.py +2 but i have encounted the new problems, i think it is because of the torch version. because xformers depend on the higher torch version that the lowest version is cu118, but my nvcc version is 116, so the solution is to update the nvcc version to 11.8 or higher. (opensora) [root@bogon Open-Sora]# nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2021 NVIDIA Corporation Built on Fri_Dec_17_18:16:03_PST_2021 Cuda compilation tools, release 11.6, V11.6.55 Build cuda_11.6.r11.6/compiler.30794723_0

wytyl13 avatar Apr 19 '24 03:04 wytyl13

Thanks for sharing your solution. And for cross-referencing, this issue was similar to issue #258.

JThh avatar Apr 21 '24 12:04 JThh