CPM-Bee
CPM-Bee copied to clipboard
关于 bmtrain 包的版本问题
环境: ubuntu server 22.04 conda python 3.10.0 nvidia driver 12.1.0
尝试一:
pip install -r requirements.txt
报错:
Collecting torch<2.0.0,>=1.10
Using cached torch-1.13.1-cp310-cp310-manylinux1_x86_64.whl (887.5 MB)
Collecting bmtrain>=0.2.1
Using cached bmtrain-0.2.2.tar.gz (58 kB)
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error
× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [6 lines of output]
Traceback (most recent call last):
File "<string>", line 2, in <module>
File "<pip-setuptools-caller>", line 34, in <module>
File "/tmp/pip-install-uftha_ch/bmtrain_6f467e6c12814fa9a32dccab67860fad/setup.py", line 2, in <module>
import torch
ModuleNotFoundError: No module named 'torch'
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
× Encountered error while generating package metadata.
╰─> See above for output.
note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
尝试二:
1,手动安装 cuda 11.8 下的 pytorch 2.0, torch.cuda.is_available() 输出 True
2,手动安装 requirements.txt 下除 torch 外的所有包
例如:pip install bmtrain>=0.2.1
出现错误:
error: subprocess-exited-with-error
× python setup.py bdist_wheel did not run successfully.
│ exit code: 1
╰─> [102 lines of output]
running bdist_wheel
/root/anaconda3/envs/CPM-Bee/lib/python3.10/site-packages/torch/utils/cpp_extension.py:476: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
warnings.warn(msg.format('we could not find ninja.'))
running build
running build_py
creating build
creating build/lib.linux-x86_64-cpython-310
creating build/lib.linux-x86_64-cpython-310/bmtrain
copying bmtrain/checkpointing.py -> build/lib.linux-x86_64-cpython-310/bmtrain
copying bmtrain/init.py -> build/lib.linux-x86_64-cpython-310/bmtrain
copying bmtrain/__init__.py -> build/lib.linux-x86_64-cpython-310/bmtrain
copying bmtrain/block_layer.py -> build/lib.linux-x86_64-cpython-310/bmtrain
copying bmtrain/debug.py -> build/lib.linux-x86_64-cpython-310/bmtrain
copying bmtrain/param_init.py -> build/lib.linux-x86_64-cpython-310/bmtrain
copying bmtrain/global_var.py -> build/lib.linux-x86_64-cpython-310/bmtrain
copying bmtrain/synchronize.py -> build/lib.linux-x86_64-cpython-310/bmtrain
copying bmtrain/parameter.py -> build/lib.linux-x86_64-cpython-310/bmtrain
copying bmtrain/utils.py -> build/lib.linux-x86_64-cpython-310/bmtrain
copying bmtrain/pipe_layer.py -> build/lib.linux-x86_64-cpython-310/bmtrain
copying bmtrain/store.py -> build/lib.linux-x86_64-cpython-310/bmtrain
copying bmtrain/layer.py -> build/lib.linux-x86_64-cpython-310/bmtrain
copying bmtrain/wrapper.py -> build/lib.linux-x86_64-cpython-310/bmtrain
creating build/lib.linux-x86_64-cpython-310/bmtrain/loss
copying bmtrain/loss/__init__.py -> build/lib.linux-x86_64-cpython-310/bmtrain/loss
copying bmtrain/loss/cross_entropy.py -> build/lib.linux-x86_64-cpython-310/bmtrain/loss
creating build/lib.linux-x86_64-cpython-310/bmtrain/optim
copying bmtrain/optim/__init__.py -> build/lib.linux-x86_64-cpython-310/bmtrain/optim
copying bmtrain/optim/optim_manager.py -> build/lib.linux-x86_64-cpython-310/bmtrain/optim
copying bmtrain/optim/adam.py -> build/lib.linux-x86_64-cpython-310/bmtrain/optim
copying bmtrain/optim/adam_offload.py -> build/lib.linux-x86_64-cpython-310/bmtrain/optim
creating build/lib.linux-x86_64-cpython-310/bmtrain/benchmark
copying bmtrain/benchmark/shape.py -> build/lib.linux-x86_64-cpython-310/bmtrain/benchmark
copying bmtrain/benchmark/__init__.py -> build/lib.linux-x86_64-cpython-310/bmtrain/benchmark
copying bmtrain/benchmark/send_recv.py -> build/lib.linux-x86_64-cpython-310/bmtrain/benchmark
copying bmtrain/benchmark/utils.py -> build/lib.linux-x86_64-cpython-310/bmtrain/benchmark
copying bmtrain/benchmark/all_gather.py -> build/lib.linux-x86_64-cpython-310/bmtrain/benchmark
copying bmtrain/benchmark/reduce_scatter.py -> build/lib.linux-x86_64-cpython-310/bmtrain/benchmark
creating build/lib.linux-x86_64-cpython-310/bmtrain/lr_scheduler
copying bmtrain/lr_scheduler/linear.py -> build/lib.linux-x86_64-cpython-310/bmtrain/lr_scheduler
copying bmtrain/lr_scheduler/__init__.py -> build/lib.linux-x86_64-cpython-310/bmtrain/lr_scheduler
copying bmtrain/lr_scheduler/cosine.py -> build/lib.linux-x86_64-cpython-310/bmtrain/lr_scheduler
copying bmtrain/lr_scheduler/noam.py -> build/lib.linux-x86_64-cpython-310/bmtrain/lr_scheduler
copying bmtrain/lr_scheduler/warmup.py -> build/lib.linux-x86_64-cpython-310/bmtrain/lr_scheduler
copying bmtrain/lr_scheduler/no_decay.py -> build/lib.linux-x86_64-cpython-310/bmtrain/lr_scheduler
copying bmtrain/lr_scheduler/exponential.py -> build/lib.linux-x86_64-cpython-310/bmtrain/lr_scheduler
creating build/lib.linux-x86_64-cpython-310/bmtrain/distributed
copying bmtrain/distributed/__init__.py -> build/lib.linux-x86_64-cpython-310/bmtrain/distributed
copying bmtrain/distributed/ops.py -> build/lib.linux-x86_64-cpython-310/bmtrain/distributed
creating build/lib.linux-x86_64-cpython-310/bmtrain/inspect
copying bmtrain/inspect/model.py -> build/lib.linux-x86_64-cpython-310/bmtrain/inspect
copying bmtrain/inspect/__init__.py -> build/lib.linux-x86_64-cpython-310/bmtrain/inspect
copying bmtrain/inspect/tensor.py -> build/lib.linux-x86_64-cpython-310/bmtrain/inspect
copying bmtrain/inspect/format.py -> build/lib.linux-x86_64-cpython-310/bmtrain/inspect
creating build/lib.linux-x86_64-cpython-310/bmtrain/nccl
copying bmtrain/nccl/__init__.py -> build/lib.linux-x86_64-cpython-310/bmtrain/nccl
copying bmtrain/nccl/enums.py -> build/lib.linux-x86_64-cpython-310/bmtrain/nccl
running build_ext
Traceback (most recent call last):
File "<string>", line 2, in <module>
File "<pip-setuptools-caller>", line 34, in <module>
File "/tmp/pip-install-b8c4nobc/bmtrain_56a2d1d4a744452aa5d4e776460000db/setup.py", line 74, in <module>
setup(
File "/root/anaconda3/envs/CPM-Bee/lib/python3.10/site-packages/setuptools/__init__.py", line 107, in setup
return distutils.core.setup(**attrs)
File "/root/anaconda3/envs/CPM-Bee/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 185, in setup
return run_commands(dist)
File "/root/anaconda3/envs/CPM-Bee/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
dist.run_commands()
File "/root/anaconda3/envs/CPM-Bee/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
self.run_command(cmd)
File "/root/anaconda3/envs/CPM-Bee/lib/python3.10/site-packages/setuptools/dist.py", line 1244, in run_command
super().run_command(command)
File "/root/anaconda3/envs/CPM-Bee/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/root/anaconda3/envs/CPM-Bee/lib/python3.10/site-packages/wheel/bdist_wheel.py", line 325, in run
self.run_command("build")
File "/root/anaconda3/envs/CPM-Bee/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
self.distribution.run_command(command)
File "/root/anaconda3/envs/CPM-Bee/lib/python3.10/site-packages/setuptools/dist.py", line 1244, in run_command
super().run_command(command)
File "/root/anaconda3/envs/CPM-Bee/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/root/anaconda3/envs/CPM-Bee/lib/python3.10/site-packages/setuptools/_distutils/command/build.py", line 131, in run
self.run_command(cmd_name)
File "/root/anaconda3/envs/CPM-Bee/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
self.distribution.run_command(command)
File "/root/anaconda3/envs/CPM-Bee/lib/python3.10/site-packages/setuptools/dist.py", line 1244, in run_command
super().run_command(command)
File "/root/anaconda3/envs/CPM-Bee/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/root/anaconda3/envs/CPM-Bee/lib/python3.10/site-packages/setuptools/command/build_ext.py", line 84, in run
_build_ext.run(self)
File "/root/anaconda3/envs/CPM-Bee/lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 345, in run
self.build_extensions()
File "/root/anaconda3/envs/CPM-Bee/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 499, in build_extensions
_check_cuda_version(compiler_name, compiler_version)
File "/root/anaconda3/envs/CPM-Bee/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 387, in _check_cuda_version
raise RuntimeError(CUDA_MISMATCH_MESSAGE.format(cuda_str_version, torch.version.cuda))
RuntimeError:
The detected CUDA version (12.1) mismatches the version that was used to compile
PyTorch (11.8). Please make sure to use the same CUDA versions.
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for bmtrain
大概意思就是版本不兼容,所以 bmtrain 要怎么装?
尝试三:
pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117
还是出现和上面一样的错误,似乎会检测显卡驱动版本,我这里安装的是12.1.0,有可能更换成11.7就没问题了,但是太麻烦了,放弃了
尝试三:
pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117
还是出现和上面一样的错误,似乎会检测显卡驱动版本,我这里安装的是12.1.0,有可能更换成11.7就没问题了,但是太麻烦了,放弃了
首先这里是因为Torch编译CUDA插件要求CUDA Toolkit版本和编译TorchCPP插件的cuda版本一致,和驱动没关系,其次装CUDA很难吗?
我是windows 的cuda 117, ******************************************************************************** python setup.py install running install D:\ProgramData\anaconda3\envs\cpm\lib\site-packages\setuptools_distutils\cmd.py:66: SetuptoolsDeprecationWarning: setup.py install is deprecated. !!
********************************************************************************
Please avoid running ``setup.py`` directly.
Instead, use pypa/build, pypa/installer, pypa/build or
other standards-based tools.
See https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html for details.
********************************************************************************
!! self.initialize_options() D:\ProgramData\anaconda3\envs\cpm\lib\site-packages\setuptools_distutils\cmd.py:66: EasyInstallDeprecationWarning: easy_install command is deprecated. !!
********************************************************************************
Please avoid running ``setup.py`` and ``easy_install``.
Instead, use pypa/build, pypa/installer, pypa/build or
other standards-based tools.
See https://github.com/pypa/setuptools/issues/917 for details.
********************************************************************************
!! self.initialize_options() running bdist_egg running egg_info writing bmtrain.egg-info\PKG-INFO writing dependency_links to bmtrain.egg-info\dependency_links.txt writing requirements to bmtrain.egg-info\requires.txt writing top-level names to bmtrain.egg-info\top_level.txt reading manifest file 'bmtrain.egg-info\SOURCES.txt' reading manifest template 'MANIFEST.in' adding license file 'LICENSE' writing manifest file 'bmtrain.egg-info\SOURCES.txt' installing library code to build\bdist.win-amd64\egg running install_lib running build_py running build_ext error: [WinError 2] 系统找不到指定的文件。
用build 安装也报错: python -m build -n -x -w
- Building wheel... running bdist_wheel running build running build_py running build_ext error: [WinError 2] 系统找不到指定的文件。
同问,cuda=12.0 python=3.7 torch=1.13,pip install bmtrain就是报错装不上。
同样报错 安装不上
同样报错 安装不上
https://github.com/OpenBMB/CPM-Bee/issues/59#issuecomment-1583979663
这是我使用的一个conda环境
conda create --prefix $(pwd)/.conda_env pytorch==1.13.1 pytorch-cuda=11.6 libcusolver-dev -c pytorch -c nvidia
bdist
直接先安装bdist,再安装bmtrain试试呢
win10环境 pip install bmtrain 报错信息 ....... nccl.obj : error LNK2001: 无法解析的外部符号 ncclCommInitRank nccl.obj : error LNK2001: 无法解析的外部符号 ncclReduce nccl.obj : error LNK2001: 无法解析的外部符号 ncclRecv nccl.obj : error LNK2001: 无法解析的外部符号 ncclGroupEnd nccl.obj : error LNK2001: 无法解析的外部符号 ncclSend nccl.obj : error LNK2001: 无法解析的外部符号 ncclCommCount nccl.obj : error LNK2001: 无法解析的外部符号 ncclGetUniqueId nccl.obj : error LNK2001: 无法解析的外部符号 ncclCommDestroy nccl.obj : error LNK2001: 无法解析的外部符号 ncclBroadcast nccl.obj : error LNK2001: 无法解析的外部符号 ncclGroupStart nccl.obj : error LNK2001: 无法解析的外部符号 ncclCommUserRank nccl.obj : error LNK2001: 无法解析的外部符号 ncclReduceScatter nccl.obj : error LNK2001: 无法解析的外部符号 ncclAllGather nccl.obj : error LNK2001: 无法解析的外部符号 ncclAllReduce nccl.obj : error LNK2001: 无法解析的外部符号 ncclGetErrorString build\lib.win-amd64-cpython-38\bmtrain\nccl_C.cp38-win_amd64.pyd : fatal error LNK1120: 15 个无法解析的外部命令 error: command 'D:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.36.32532\bin\HostX86\x64\link.exe' failed with exit code 1120 [end of output]
note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building wheel for bmtrain Running setup.py clean for bmtrain Failed to build bmtrain ERROR: Could not build wheels for bmtrain, which is required to install pyproject.toml-based projects 安装bmtrain 一直报这个错 尝试安装了VS2019 与2022 需要怎么安装呢 cuda ==11.7 torch==1.13