Open-Sora
Open-Sora copied to clipboard
Cannot install apex
NVIDIA's apex has actual syntax errors -_____________- can someone point me to a version that actually complies? Thanks!
$ export CC=/usr/bin/gcc-12
$ export CXX=/usr/bin/g++-12
$ pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" git+https://github.com/NVIDIA/apex.git
...
...
[2/2] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /tmp/pip-req-build-ekqfqf73/build/temp.linux-x86_64-cpython-310/csrc/mlp_cuda.o.d -I/home/dheera/miniconda3/envs/opensora/lib/python3.10/site-packages/torch/include -I/home/dheera/miniconda3/envs/opensora/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/home/dheera/miniconda3/envs/opensora/lib/python3.10/site-packages/torch/include/TH -I/home/dheera/miniconda3/envs/opensora/lib/python3.10/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/dheera/miniconda3/envs/opensora/include/python3.10 -c -c /tmp/pip-req-build-ekqfqf73/csrc/mlp_cuda.cu -o /tmp/pip-req-build-ekqfqf73/build/temp.linux-x86_64-cpython-310/csrc/mlp_cuda.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=mlp_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 -ccbin /usr/bin/gcc-12 -std=c++17
FAILED: /tmp/pip-req-build-ekqfqf73/build/temp.linux-x86_64-cpython-310/csrc/mlp_cuda.o
/usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /tmp/pip-req-build-ekqfqf73/build/temp.linux-x86_64-cpython-310/csrc/mlp_cuda.o.d -I/home/dheera/miniconda3/envs/opensora/lib/python3.10/site-packages/torch/include -I/home/dheera/miniconda3/envs/opensora/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/home/dheera/miniconda3/envs/opensora/lib/python3.10/site-packages/torch/include/TH -I/home/dheera/miniconda3/envs/opensora/lib/python3.10/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/dheera/miniconda3/envs/opensora/include/python3.10 -c -c /tmp/pip-req-build-ekqfqf73/csrc/mlp_cuda.cu -o /tmp/pip-req-build-ekqfqf73/build/temp.linux-x86_64-cpython-310/csrc/mlp_cuda.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=mlp_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 -ccbin /usr/bin/gcc-12 -std=c++17
/home/dheera/miniconda3/envs/opensora/lib/python3.10/site-packages/torch/include/pybind11/detail/../cast.h: In function ‘typename pybind11::detail::type_caster<typename pybind11::detail::intrinsic_type<T>::type>::cast_op_type<T> pybind11::detail::cast_op(make_caster<T>&)’:
/home/dheera/miniconda3/envs/opensora/lib/python3.10/site-packages/torch/include/pybind11/detail/../cast.h:45:120: error: expected template-name before ‘<’ token
45 | return caster.operator typename make_caster<T>::template cast_op_type<T>();
| ^
/home/dheera/miniconda3/envs/opensora/lib/python3.10/site-packages/torch/include/pybind11/detail/../cast.h:45:120: error: expected identifier before ‘<’ token
/home/dheera/miniconda3/envs/opensora/lib/python3.10/site-packages/torch/include/pybind11/detail/../cast.h:45:123: error: expected primary-expression before ‘>’ token
45 | return caster.operator typename make_caster<T>::template cast_op_type<T>();
| ^
/home/dheera/miniconda3/envs/opensora/lib/python3.10/site-packages/torch/include/pybind11/detail/../cast.h:45:126: error: expected primary-expression before ‘)’ token
45 | return caster.operator typename make_caster<T>::template cast_op_type<T>();
i have the same issue, install another apex version just like the command as follow:
git clone https://github.com/NVIDIA/apex.git
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
but have failed to run the scripts "torchrun --standalone --nproc_per_node 1 scripts/inference.py configs/opensora/inference/16x512x512.py --ckpt-path OpenSora-v1-HQ-16x512x512.pth --prompt-path ./assets/texts/t2v_samples.txt" , the error show that as follow:
"File "/gl_data/soft/miniconda3/envs/opensora/lib/python3.10/site-packages/apex/amp/_amp_state.py", line 14, in
i have the same issue, install another apex version just like the command as follow: git clone https://github.com/NVIDIA/apex.git cd apex pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./ but have failed to run the scripts "torchrun --standalone --nproc_per_node 1 scripts/inference.py configs/opensora/inference/16x512x512.py --ckpt-path OpenSora-v1-HQ-16x512x512.pth --prompt-path ./assets/texts/t2v_samples.txt" , the error show that as follow: "File "/gl_data/soft/miniconda3/envs/opensora/lib/python3.10/site-packages/apex/amp/_amp_state.py", line 14, in from torch._six import container_abcs ModuleNotFoundError: No module named 'torch._six'" so i think it is the reason that the apex version is not correspond to the pytorch verison. so i am very thankful that who can give me one solution or the corresponding apex version for torch2.2.2+cu118
i have found the error is because the torch version is high, https://github.com/NVIDIA/apex/pull/1049
i have the same issue, install another apex version just like the command as follow: git clone https://github.com/NVIDIA/apex.git cd apex pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./ but have failed to run the scripts "torchrun --standalone --nproc_per_node 1 scripts/inference.py configs/opensora/inference/16x512x512.py --ckpt-path OpenSora-v1-HQ-16x512x512.pth --prompt-path ./assets/texts/t2v_samples.txt" , the error show that as follow: "File "/gl_data/soft/miniconda3/envs/opensora/lib/python3.10/site-packages/apex/amp/_amp_state.py", line 14, in from torch._six import container_abcs ModuleNotFoundError: No module named 'torch._six'" so i think it is the reason that the apex version is not correspond to the pytorch verison. so i am very thankful that who can give me one solution or the corresponding apex version for torch2.2.2+cu118
i have found the error is because the torch version is high, NVIDIA/apex#1049
I have successful handle the problem, by using TORCH_MAJOR = int(torch.version.split('.')[0]) TORCH_MINOR = int(torch.version.split('.')[1]) if TORCH_MAJOR == 1 and TORCH_MINOR < 8: from torch._six import container_abcs else: import collections.abc as container_abcs to change from torch._six import string_classes in vim /gl_data/soft/miniconda3/envs/opensora/lib/python3.10/site-packages/apex/amp/_initialize.py +2 but i have encounted the new problems, i think it is because of the torch version. because xformers depend on the higher torch version that the lowest version is cu118, but my nvcc version is 116, so the solution is to update the nvcc version to 11.8 or higher. (opensora) [root@bogon Open-Sora]# nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2021 NVIDIA Corporation Built on Fri_Dec_17_18:16:03_PST_2021 Cuda compilation tools, release 11.6, V11.6.55 Build cuda_11.6.r11.6/compiler.30794723_0
Thanks for sharing your solution. And for cross-referencing, this issue was similar to issue #258.