Failed the last time, succeeded the next time?上一次还失败,下一次就成功了?
我执行的是下面这条命令: I executed the following command:
pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./
第一次运行时,显示失败了,失败的原因是:
The first time I run it, the display fails because:
RuntimeError: Error compiling objects for extension
error: subprocess-exited-with-error
× Building wheel for apex (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> See above for output.
note: This error originates from a subprocess, and is likely not a problem with pip.
full command: /home/zsf/anaconda3/envs/pyt231py312_2_linux/bin/python3.12 /home/zsf/anaconda3/envs/pyt231py312_2_linux/lib/python3.12/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py build_wheel /tmp/tmp2ijgo8p4
cwd: /home/zsf/anaconda3/envs/pyt231py312_2_linux/apex
Building wheel for apex (pyproject.toml) ... error
ERROR: Failed building wheel for apex
Failed to build apex
ERROR: ERROR: Failed to build installable wheels for some pyproject.toml based projects (apex)
其实这次还是有进步的,之前运行的那些乱七八糟的pip install命令,有的报TypeError: str,有的报No module Named torch(可我明明已经安装了pytorch了啊)。
However, this time there is an improvement. The previous pip install command was causing a mess of TypeError and No module Named torch(even though I already have pytorch installed).
第二遍时,我嫌显示的信息太多,就把-v项去了,然后等了好几分钟(10 mins?),就显示成功了,真是太扯了。
The second time, I thought there was too much information to display, so I removed the -v item, and then I waited a few minutes(10 mins?), and the display was successful.
Processing /home/zsf/anaconda3/envs/pyt231py312_2_linux/apex
Preparing metadata (pyproject.toml) ... done
Requirement already satisfied: packaging>20.6 in /home/zsf/anaconda3/envs/pyt231py312_2_linux/lib/python3.12/site-packages (from apex==0.1) (24.1)
Building wheels for collected packages: apex
Building wheel for apex (pyproject.toml) ... done
Created wheel for apex: filename=apex-0.1-cp312-cp312-linux_x86_64.whl size=4844829 sha256=5256a4aa59e969e609ca1ba25f616b68607eac921bde36fbff1c063a4515a570
Stored in directory: /tmp/pip-ephem-wheel-cache-milgfajo/wheels/45/ef/09/6cfbe9deb98dfb0c3024c7fb91f389935bccbff826387be8f2
Successfully built apex
Installing collected packages: apex
Successfully installed apex-0.1
我在conda虚拟环境中安装apex。 我使用的命令是: I installed apex in the conda virtual environment. The command I used was:
pip install --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./
虚拟环境使用的是pytorch2.3.1,cuda_version:12.1。 The virtual environment is pytorch2.3.1, cuda_version:12.1. 然后使用的系统是Ubuntu22.04LTS。 The system used is Ubuntu22.04LTS.
安装apex时,如果指定了--config-settings "--build-option=--cpp_ext" 和--config-settings "--build-option=--cuda_ext",就需要安装gcc和对应虚拟环境cuda版本的cudatoolkit。cudatoolkit是安装在系统上的,不是安装在虚拟环境中。
When installing apex, if you specify --config-settings "--build-option=--cpp_ext" and --config-settings "--build-option=--cuda_ext", You need to install gcc and the corresponding virtual environment cuda version of cudatoolkit. cudatoolkit is installed on the system, not in a virtual environment.
关于cudatoolkit的安装,https://developer.nvidia.com/cuda-toolkit-archive, 一定要安装与虚拟环境cuda版本对应的cudatoolkit。
Installation of cudatoolkit https://developer.nvidia.com/cuda-toolkit-archive, virtual environment cuda version must be installed with the corresponding cudatoolkit.
下面是安装的cudatoolkit版本与虚拟环境中cuda版本不一致时会报的错误:
Here are the errors that will occur when the version of cudatoolkit installed does not match the version of cuda in the virtual environment:
- [RuntimeError: Cuda extensions are being compiled with a version of Cuda that does not match the version used to compile Pytorch binaries. Pytorch binaries were compiled with Cuda 11.3.
In some cases, a minor-version mismatch will not cause later errors: https://github.com/NVIDIA/apex/pull/323#discussion_r287021798. You can try commenting out this check (at your own risk).
试试这个?https://github.com/AlongWY/apex_wheels/releases/tag/v24.4.1
试试这个?https://github.com/AlongWY/apex_wheels/releases/tag/v24.4.1
谢谢,我非常轻松地安装了apex 24.4.1+cu121torch2.3.1。
不过我想问问,你给的网址提供的apex和官网上的有什么区别?这些whl文件是从哪里来的?如果有这么简便地安装方式,官网上为何没有呢?
24.4.1是用年月日表示的版本号吗?从官网上下的apex安装的版本显示的是0.1,官网难道从来没有更新过版本吗?
我对apex这种非正常的安装气氛感到非常疑惑。
What is your usual setup? We build and test apex mostly in containers. "docker pull nvcr.io/nvidia/pytorch:25.06-py3"
Next time you can try to build apex wheel in the containers (you can also change the 25.06 to other year/months).
试试这个?https://github.com/AlongWY/apex_wheels/releases/tag/v24.4.1
谢谢,我非常轻松地安装了
apex 24.4.1+cu121torch2.3.1。 不过我想问问,你给的网址提供的apex和官网上的有什么区别?这些whl文件是从哪里来的?如果有这么简便地安装方式,官网上为何没有呢? 24.4.1是用年月日表示的版本号吗?从官网上下的apex安装的版本显示的是0.1,官网难道从来没有更新过版本吗? 我对apex这种非正常的安装气氛感到非常疑惑。
这是我为了自己安装方便使用 github actions 自动构建的包,只使用了 apex 带 tag 的分支进行了构建,与 apex 官方无关