DATSR icon indicating copy to clipboard operation
DATSR copied to clipboard

Setting up DCNv2 properly on RTX 4000 series GPUs

Open mayhemsloth opened this issue 1 year ago • 4 comments

I am attempting to fine-tune train a reference-based super resolution model with a personal, custom dataset. However, I am unable to properly install and run any of the codebases that depend on the C2-Matching codebase (C2-Matching, DATSR, AMSA , MRefSR) due to the DCNv2 installation process failing.

I consistently run into an error during the step of

cd mmsr/models/archs/DCNv2
python setup.py build develop

where PyTorch is seemingly missing THC/THC.h files (see this pytorch forums thread, and this Github issues comment) that were removed in PyTorch 1.11. So if you need to downgrade to below PyTorch 1.11, then according to the the previous versions page, the latest CUDA version for 1.10.1 is CUDA 11.3. However, the minimum CUDA version supported on RTX 4000 series GPUs is CUDA 11.8. Note that I have tried so many variations of Pytorch versions and CUDA version installations, and nothing has worked so far.

What is the solution to this? How can I get the DCNv2 to compile correctly and have the layer be usable? I would very much like to run barely 1 year old SOTA architectures on the most recent GPUs.

As someone versed in PyTorch but not in CUDA or C++ programming, can I somehow replace this DCNv2 layer with a torchvision layer?

mayhemsloth avatar Nov 17 '23 02:11 mayhemsloth

I am attempting to fine-tune train a reference-based super resolution model with a personal, custom dataset. However, I am unable to properly install and run any of the codebases that depend on the C2-Matching codebase (C2-Matching, DATSR, AMSA , MRefSR) due to the DCNv2 installation process failing.

I consistently run into an error during the step of

cd mmsr/models/archs/DCNv2
python setup.py build develop

where PyTorch is seemingly missing THC/THC.h files (see this pytorch forums thread, and this Github issues comment) that were removed in PyTorch 1.11. So if you need to downgrade to below PyTorch 1.11, then according to the the previous versions page, the latest CUDA version for 1.10.1 is CUDA 11.3. However, the minimum CUDA version supported on RTX 4000 series GPUs is CUDA 11.8. Note that I have tried so many variations of Pytorch versions and CUDA version installations, and nothing has worked so far.

What is the solution to this? How can I get the DCNv2 to compile correctly and have the layer be usable? I would very much like to run barely 1 year old SOTA architectures on the most recent GPUs.

As someone versed in PyTorch but not in CUDA or C++ programming, can I somehow replace this DCNv2 layer with a torchvision layer?

Hello, I have solved this problem in C2-matching, I guess it could be similarly solved for DATSR. But I am using RTX 3090 and CUDA 11.3, I am not sure whether it's suitable for RTX 4000 series. You can try it here https://github.com/include5636/C2-Matching-CUDA11 Hope helpful!

include5636 avatar Dec 14 '23 15:12 include5636

我正在尝试使用个人自定义数据集微调训练基于参考的超分辨率模型。但是, _,我无法正确安装和运行任何 _依赖于 C2-Matching 代码库( C2-Matching DATSR AMSA MRefSR 由于 DCNv2 安装过程失败 我在步骤中一直遇到错误

cd mmsr/models/archs/DCNv2
python setup.py build develop

其中 PyTorch 似乎缺少在 PyTorch 1.11 中删除的 THC/THC.h 文件(请参阅 此 pytorch 论坛线程 ,以及 此 Github 问题评论 )。所以如果你需要降级到PyTorch 1.11以下,那么根据 之前的版本页面, 1.10.1的最新CUDA版本是CUDA 11.3。但是,RTX 4000 系列 GPU 支持的最低 CUDA 版本为 CUDA 11.8。请注意,我已经尝试了 Pytorch 版本和 CUDA 版本安装的多种变体,但到目前为止没有任何效果。 解决这个问题的方法是什么?如何使 DCNv2 正确编译并使该层可用?我非常想在最新的 GPU 上运行不到 1 年的 SOTA 架构。 作为精通 PyTorch 但不精通 CUDA 或 C++ 编程的人,我可以以某种方式将这个 DCNv2 层替换 为 torchvision 层 吗?

您好,我已经在 C2 匹配中解决了这个问题,我想 DATSR 也可以类似地解决。 但是我使用的是 RTX 3090 和 CUDA 11.3,我不确定它是否适合 RTX 4000 系列。 你可以在这里试试 https://github.com/include5636/C2-Matching-CUDA11 https://github.com/include5636/C2-Matching-CUDA11 希望对您有所帮助!

Can you run DATSR normally in your environment?Thank you.

heng0607 avatar Mar 11 '24 11:03 heng0607

@heng0607 Yes and I found an easier way to solve this issue. You just need this:

pip install mmcv-full

Because in this line, https://github.com/caojiezhang/DATSR/blob/76faa616774cbef983acaa855eaaec75d5dc9d8c/datsr/models/archs/dcn_v2.py#L14 mmcv==0.4.4 (which is mmcv-lite now) does not contain ModulatedDeformConv2d, modulated_deform_conv2d, you need to use mmcv-full. Hope helpful! If you run into any problems, feel free to contact me.

include5636 avatar Mar 12 '24 10:03 include5636

@heng0607 Yes and I found an easier way to solve this issue. You just need this:

pip install mmcv-full

Because in this line,

https://github.com/caojiezhang/DATSR/blob/76faa616774cbef983acaa855eaaec75d5dc9d8c/datsr/models/archs/dcn_v2.py#L14

mmcv==0.4.4 (which is mmcv-lite now) does not contain ModulatedDeformConv2d, modulated_deform_conv2d, you need to use mmcv-full. Hope helpful! If you run into any problems, feel free to contac

@heng0607 是的,我找到了一种更简单的方法来解决这个问题。你只需要这个:

pip install mmcv-full

因为在这一行中,

https://github.com/caojiezhang/DATSR/blob/76faa616774cbef983acaa855eaaec75d5dc9d8c/datsr/models/archs/dcn_v2.py#L14

mmcv==0.4.4(现在是 mmcv-lite)不包含 ModulatedDeformConv2d, modulated_deform_conv2d,则需要使用 mmcv-full。 希望对您有所帮助!如果您遇到任何问题,请随时与我联系。

I had already solved the problem before I saw your reply, but thank you for your help

heng0607 avatar Mar 13 '24 01:03 heng0607