mmcv icon indicating copy to clipboard operation
mmcv copied to clipboard

[Bug] 5090显卡无法适配

Open 1273603741 opened this issue 7 months ago • 25 comments

Prerequisite

  • [x] I have searched Issues and Discussions but cannot get the expected help.
  • [x] The bug has not been fixed in the latest version(https://github.com/open-mmlab/mmcv).

Environment

5090显卡根本配不了环境,怎么都是显示mmcv库缺失,cuda12.8赶紧给适配下5090好用啊

Reproduces the problem - code sample

Traceback (most recent call last): File "./tools/train.py", line 287, in main() File "./tools/train.py", line 276, in main train_model( File "/home/super/lwy/rcbevdet/mmdet3d/apis/train.py", line 351, in train_model train_detector( File "/home/super/lwy/rcbevdet/mmdet3d/apis/train.py", line 227, in train_detector model = MMDistributedDataParallel( File "/home/super/anaconda3/envs/rcbevdet/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 646, in init _verify_param_shape_across_processes(self.process_group, parameters) File "/home/super/anaconda3/envs/rcbevdet/lib/python3.8/site-packages/torch/distributed/utils.py", line 89, in _verify_param_shape_across_processes return dist._verify_params_across_processes(process_group, tensors, logger) RuntimeError: CUDA error: invalid device function CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 899262) of binary: /home/super/anaconda3/envs/rcbevdet/bin/python

Reproduces the problem - command or script

适配5090显卡

Reproduces the problem - error message

Traceback (most recent call last): File "./tools/train.py", line 287, in main() File "./tools/train.py", line 276, in main train_model( File "/home/super/lwy/rcbevdet/mmdet3d/apis/train.py", line 351, in train_model train_detector( File "/home/super/lwy/rcbevdet/mmdet3d/apis/train.py", line 227, in train_detector model = MMDistributedDataParallel( File "/home/super/anaconda3/envs/rcbevdet/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 646, in init _verify_param_shape_across_processes(self.process_group, parameters) File "/home/super/anaconda3/envs/rcbevdet/lib/python3.8/site-packages/torch/distributed/utils.py", line 89, in _verify_param_shape_across_processes return dist._verify_params_across_processes(process_group, tensors, logger) RuntimeError: CUDA error: invalid device function CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 899262) of binary: /home/super/anaconda3/envs/rcbevdet/bin/python

Additional information

No response

1273603741 avatar May 16 '25 09:05 1273603741

这项目不知道还活着没有,50系列显卡使用cu128各种报错

wilsonlv avatar May 26 '25 01:05 wilsonlv

这项目不知道还活着没有,50系列显卡使用cu128各种报错 可以试试mmcv1.7.2,里面要改些东西才能适配50系显卡

1273603741 avatar May 26 '25 09:05 1273603741

这项目不知道还活着没有,50系列显卡使用cu128各种报错 可以试试mmcv1.7.2,里面要改些东西才能适配50系显卡

请问楼主具体怎么解决的

wilsonlv avatar May 27 '25 01:05 wilsonlv

这项目不知道还活着没有,50系列显卡使用cu128各种报错 可以试试mmcv1.7.2,里面要改些东西才能适配50系显卡

请问楼主具体怎么解决的

是把mmcv1.7.2下载下来,里面改了些内容,然后还要把代码适配高版本pytorch,弄的挺麻烦的,搞了3个小时才弄好

1273603741 avatar May 27 '25 01:05 1273603741

这项目不知道还活着没有,50系列显卡使用cu128各种报错 可以试试mmcv1.7.2,里面要改些东西才能适配50系显卡

请问楼主具体怎么解决的

是把mmcv1.7.2下载下来,里面改了些内容,然后还要把代码适配高版本pytorch,弄的挺麻烦的,搞了3个小时才弄好

麻烦分享一下改动的源码,谢谢

wilsonlv avatar May 27 '25 01:05 wilsonlv

这项目不知道还活着没有,50系列显卡使用cu128各种报错 可以试试mmcv1.7.2,里面要改些东西才能适配50系显卡

请问楼主具体怎么解决的

是把mmcv1.7.2下载下来,里面改了些内容,然后还要把代码适配高版本pytorch,弄的挺麻烦的,搞了3个小时才弄好

能否出个git分享一下代码,谢谢了

jzy12312 avatar May 27 '25 08:05 jzy12312

这项目不知道还活着没有,50系列显卡使用cu128各种报错 可以试试mmcv1.7.2,里面要改些东西才能适配50系显卡

请问楼主具体怎么解决的

是把mmcv1.7.2下载下来,里面改了些内容,然后还要把代码适配高版本pytorch,弄的挺麻烦的,搞了3个小时才弄好

请问楼主是如何解决的啊?特别希望能换新的50系显卡跑之前的项目,之前 用的0.30.0的mmcv

516525465 avatar May 27 '25 18:05 516525465

这项目不知道还活着没有,50系列显卡使用cu128各种报错 可以试试mmcv1.7.2,里面要改些东西才能适配50系显卡

请问楼主具体怎么解决的

是把mmcv1.7.2下载下来,里面改了些内容,然后还要把代码适配高版本pytorch,弄的挺麻烦的,搞了3个小时才弄好

请问楼主是如何解决的啊?特别希望能换新的50系显卡跑之前的项目,之前 用的0.30.0的mmcv

我是找了个技术帮助解决的,mmcv代码里面改了些东西,挺麻烦的

1273603741 avatar May 28 '25 09:05 1273603741

摸索出来了哥们们,文档已经整理好了,可以参考一下: https://gitee.com/Wilson_Lws/MuseTalk-50Series-Adaptation/blob/master/README.md

wilsonlv avatar May 29 '25 02:05 wilsonlv

摸索出来了哥们们,文档已经整理好了,可以参考一下: https://gitee.com/Wilson_Lws/MuseTalk-50Series-Adaptation/blob/master/README.md

牛逼

1273603741 avatar May 29 '25 02:05 1273603741

直接按照官网文档里面的源码编译安装成功了,wsl2里面,cuda12.8+pytorch2.7

shyhoom avatar Jun 05 '25 12:06 shyhoom

直接参考官方文档的源码编译就好了 https://mmcv.readthedocs.io/zh-cn/latest/get_started/build.html

TorchVision: 0.22.1+cu128 OpenCV: 4.11.0 MMEngine: 0.10.7 MMCV: 2.2.0 MMCV Compiler: GCC 11.4 MMCV CUDA Compiler: 12.8

foundnom avatar Jun 20 '25 06:06 foundnom

参考官网文档编译,会报这个错误

Image

STHxiao avatar Jul 28 '25 15:07 STHxiao

更新一下,bashrc中,第二行这个要注释了才能编译成功

Image

STHxiao avatar Jul 28 '25 17:07 STHxiao

LINK : fatal error LNK1181: 无法打开输入文件“E:\ziliao\mmcv\build\temp.win-amd64-cpython-310\Release\mmcv\ops\csrc\pytorch\cpu\active_rotated_filter.obj” error: command 'E:\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\bin\HostX86\x64\link.exe' failed with exit code 1181

5060 py3.10 cuda12.8 按照官方编译到这实在进行不下去了

Xudada-5566 avatar Jul 31 '25 17:07 Xudada-5566

LINK : fatal error LNK1181: 无法打开输入文件“E:\ziliao\mmcv\build\temp.win-amd64-cpython-310\Release\mmcv\ops\csrc\pytorch\cpu\active_rotated_filter.obj” error: command 'E:\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\bin\HostX86\x64\link.exe' failed with exit code 1181

5060 py3.10 cuda12.8 按照官方编译到这实在进行不下去了

用wsl2,win下可能不太行

STHxiao avatar Aug 01 '25 07:08 STHxiao

按照源码编译显示安装成功但是运行测试文件显示找不到mmcv._ext

Image

YGJing7 avatar Sep 25 '25 03:09 YGJing7

按照源码编译显示安装成功但是运行测试文件显示找不到mmcv._ext

Image

我也遇到这个问题了,请问怎么解决?

JillianaaaCHEN avatar Sep 29 '25 08:09 JillianaaaCHEN

直接按照官网文档里面的源码编译安装成功了,wsl2里面,cuda12.8+pytorch2.7

mmcv和mmsegmentation都是按官网文档里源码编译安装的,但按官网文档进行验证时,报错了AssertionError: MMCV==2.2.0 is used but incompatible. Please install mmcv>=2.0.0rc4.目前没找到解决办法,mmcv源码编译安装还能选装2.0.0或2.1.0版本吗?

magirisyang avatar Oct 08 '25 05:10 magirisyang

直接参考官方文档的源码编译就好了 https://mmcv.readthedocs.io/zh-cn/latest/get_started/build.html

TorchVision: 0.22.1+cu128 OpenCV: 4.11.0 MMEngine: 0.10.7 MMCV: 2.2.0 MMCV Compiler: GCC 11.4MMCV 编译器: GCC 11.4 MMCV CUDA Compiler: 12.8MMCV CUDA 编译器: 12.8

mmcv和mmsegmentation都是按官网文档里源码编译安装的,但按官网文档进行验证时,报错了AssertionError: MMCV==2.2.0 is used but incompatible. Please install mmcv>=2.0.0rc4.目前没找到解决办法,mmcv源码编译安装还能选装2.0.0或2.1.0版本吗?

magirisyang avatar Oct 08 '25 05:10 magirisyang

直接参考官方文档的源码编译就好了 https://mmcv.readthedocs.io/zh-cn/latest/get_started/build.html TorchVision: 0.22.1+cu128 OpenCV: 4.11.0 MMEngine: 0.10.7 MMCV: 2.2.0 MMCV Compiler: GCC 11.4MMCV 编译器: GCC 11.4 MMCV CUDA Compiler: 12.8MMCV CUDA 编译器: 12.8

mmcv和mmsegmentation都是按官网文档里源码编译安装的,但按官网文档进行验证时,报错了AssertionError: MMCV==2.2.0 is used but incompatible. Please install mmcv>=2.0.0rc4.目前没找到解决办法,mmcv源码编译安装还能选装2.0.0或2.1.0版本吗?

你直接切换到2.0.0rc4编译就好了

foundnom avatar Oct 09 '25 01:10 foundnom

Hi guys, I am using RTX 5090 and now I am able to build mmcv and mmdet in my environment

This is my screenshot about my env

Image

The CLIs to install mmcv and mmdet:

conda create -n my_env python=3.10
conda activate my_env
pip install scipy==1.15

While building mmdet, the error while installing scipy would happen, so we installed scipy to counter that

Now, we are going to install mmengine, mmdet, and mmcv, make sure you guys successfully installed pytorch and that pytorch version able to work with sm_120 on RTX 5000 series.

git clone https://github.com/open-mmlab/mmengine.git && cd mmengine
python3 setup.py install && cd ..

git clone https://github.com/open-mmlab/mmcv.git && cd mmcv
python3 setup.py install && cd ..

git clone https://github.com/open-mmlab/mmdetection.git && cd mmdetection
python3 setup.py install && cd ..

GiaKhangLuu avatar Dec 06 '25 09:12 GiaKhangLuu

按照源码编译显示安装成功但是运行测试文件显示找不到mmcv._ext

Image

我也是这样,请问解决了吗

didier404 avatar Dec 08 '25 09:12 didier404

按照源码编译显示安装成功但是运行测试文件显示找不到mmcv._ext Image

我也是这样,请问解决了吗

Instead of typing

pip install -e . -v as the tutorial.

You should use

python3 setup.py install

GiaKhangLuu avatar Dec 08 '25 09:12 GiaKhangLuu

按照源码编译显示安装成功但是运行测试文件显示找不到mmcv._ext Image

我也是这样,请问解决了吗

Instead of typing

pip install -e . -v as the tutorial.

You should use

python3 setup.py install

work for me THANK YOU

didier404 avatar Dec 09 '25 08:12 didier404