ultralytics icon indicating copy to clipboard operation
ultralytics copied to clipboard

Improved a little piece of code that could cause errors related to CUDA and Mamba modules.

Open DaozeTang opened this issue 1 year ago • 14 comments

This code is mainly an optimisation for a problem I had.

In short, my CUDA environment and the code worked fine with no problems. However, when I try to make architectural improvements on top of Ultralytics' YOLOv8 by introducing Mamba architecture related modules, I am prompted with an error in CUDA and cannot continue to run.

The specific error message is shown below.

image

This optimised code avoids the above error by adding a determination to enable it to run correctly on CUDA.

🛠️ PR Summary

Made with ❤️ by Ultralytics Actions

🌟 Summary

Enhanced stride computation with CPU/GPU fallback handling for better hardware compatibility.

📊 Key Changes

  • 🛠️ Added try-except blocks to handle potential RuntimeError during stride computation on CPU.
  • 🚀 Introduced a fallback mechanism to attempt stride computation on CUDA (GPU) if CPU fails.
  • 🐛 Properly raise the caught RuntimeError if both CPU and CUDA fail.

🎯 Purpose & Impact

  • 🔧 Improved Robustness: Ensures stride computation doesn't fail outright on CPU, offering fallback to GPU.
  • 💻 Hardware Compatibility: Enhances the model's adaptability to different hardware setups, ensuring smoother operations regardless of available resources.
  • 📈 User Experience: Reduces potential errors during setup, facilitating a more seamless workflow for users.

DaozeTang avatar Jul 11 '24 23:07 DaozeTang

All Contributors have signed the CLA. ✅
Posted by the CLA Assistant Lite bot.

github-actions[bot] avatar Jul 11 '24 23:07 github-actions[bot]

I have read the CLA Document and I sign the CLA

DaozeTang avatar Jul 11 '24 23:07 DaozeTang

@Tdzdele is there a device mismatch here, i.e. is the model on CUDA and the input tensor on CPU?

Would it be simpler to simply convert the input tensor to the model device always on this line?

glenn-jocher avatar Jul 12 '24 09:07 glenn-jocher

@Tdzdele I reviewed the code here and I don't see any opportunity for any modules to be on CUDA at the time of this line. If you are introducing CUDA modules here then I would advise you to convert them to CPU first to join the other CPU modules.

glenn-jocher avatar Jul 12 '24 09:07 glenn-jocher

@glenn-jocher Yes, the original code of the ultralytics framework does not cause problems here. It is mainly when trying to introduce external modules into the model of the ultralytics framework that some errors may be reported due to the external modules themselves, such as Mamba related modules or Deformable Convolution v4 (DCNv4) related modules. With this modification, some potential problems can be avoided simply and effectively, and I think this can improve the compatibility of the ultralytics framework, lower some development thresholds, and make it easier for more developers to make more integration improvements based on the ultralytics framework.

DaozeTang avatar Jul 12 '24 09:07 DaozeTang

@Tdzdele yes I see that, but try except blocks are not good practice as they don't actually address the root of the problem.

If the model is on a given device and the input is on a another that should be solvable without resorting to try except:

device = # code to get model device here
m.stride = torch.tensor([s / x.shape[-2] for x in _forward(torch.zeros((1, ch, s, s), device=device))])

If the issue is that the user is introducing modules that are mixed CPU and GPU that is user error and it is their responsibility to address that.

glenn-jocher avatar Jul 12 '24 10:07 glenn-jocher

@glenn-jocher I got it, thanks very much for comments!

DaozeTang avatar Jul 12 '24 11:07 DaozeTang

You're welcome! 😊 If you encounter any further issues or have additional questions, feel free to reach out. Also, if you haven't already, please ensure you're using the latest versions of the packages to benefit from recent fixes and improvements. For any reproducible examples, you can refer to our minimum reproducible example guide. This helps us diagnose and resolve issues more effectively. Happy coding! 🚀

glenn-jocher avatar Jul 12 '24 18:07 glenn-jocher

I'm curious how you are trying to use selective scan to improve Ultralytics with Mamba. Are you trying to make a Mamba-based object detector?

rolson24 avatar Jul 20 '24 00:07 rolson24

Thank you for your interest! It sounds like you're exploring innovative ways to enhance Ultralytics with Mamba. If you're integrating Mamba-based modules or selective scan techniques, ensuring compatibility with the latest versions of our packages is crucial. This can help avoid potential issues and leverage recent improvements. If you encounter specific errors or need further assistance, please provide more details, and we'll be happy to help.

pderrenger avatar Jul 20 '24 08:07 pderrenger

Doesn't the mamba-ssm install require some complex CUDA setup/configurations? Maybe that's not what you're using @Tdzdele but I was looking at this the other day, and that seemed to be a requirement that would make integration difficult. I bring it up as maybe this is semi-related to your issue, but also b/c if you're planning to open a PR later to integrate mamba-ssm (the fact you opened this PR is the basis for this assumption) that would require users to jump through a lot of hoops to enable, I figured it would be better to let you know now that it has a very low probability of getting accepted. Reference

Burhan-Q avatar Jul 25 '24 15:07 Burhan-Q

I'm curious how you are trying to use selective scan to improve Ultralytics with Mamba. Are you trying to make a Mamba-based object detector?

@rolson24 Yes, previously I tried to fuse some of the structures in Mamba with the YOLO family. However, after my attempts, although such an operation may bring some accuracy improvement, due to the complexity of the structures in Mamba, fusing them into the YOLO architecture will drastically reduce the inference speed of the model, and is more likely to lead to some bugs, which I think is a difficult research direction

DaozeTang avatar Jul 31 '24 17:07 DaozeTang

Doesn't the mamba-ssm install require some complex CUDA setup/configurations? Maybe that's not what you're using @Tdzdele but I was looking at this the other day, and that seemed to be a requirement that would make integration difficult. I bring it up as maybe this is semi-related to your issue, but also b/c if you're planning to open a PR later to integrate mamba-ssm (the fact you opened this PR is the basis for this assumption) that would require users to jump through a lot of hoops to enable, I figured it would be better to let you know now that it has a very low probability of getting accepted. Reference

Okay. Thank you very much.

DaozeTang avatar Jul 31 '24 17:07 DaozeTang

Yes, integrating mamba-ssm can indeed involve complex CUDA configurations, which might complicate the setup for many users. If you're considering a PR for this integration, please be aware that such complexity could hinder its acceptance. We aim to maintain ease of use and broad compatibility in our framework. If you encounter specific issues or have further questions, feel free to share more details, and we'll do our best to assist you. Thank you for your understanding.

pderrenger avatar Jul 31 '24 20:07 pderrenger

Codecov Report

Attention: Patch coverage is 25.00000% with 6 lines in your changes missing coverage. Please review.

Project coverage is 73.98%. Comparing base (dbdb451) to head (060dd02). Report is 221 commits behind head on main.

Files with missing lines Patch % Lines
ultralytics/nn/tasks.py 25.00% 6 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #14371      +/-   ##
==========================================
+ Coverage   73.97%   73.98%   +0.01%     
==========================================
  Files         128      128              
  Lines       17118    17125       +7     
==========================================
+ Hits        12663    12670       +7     
  Misses       4455     4455              
Flag Coverage Δ
Benchmarks 34.98% <0.00%> (+0.02%) :arrow_up:
GPU 39.07% <25.00%> (-0.02%) :arrow_down:
Tests 67.73% <25.00%> (-0.03%) :arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Aug 26 '24 01:08 codecov[bot]

You're correct, integrating mamba-ssm would indeed require complex CUDA configurations, which could complicate the setup for users. Given these challenges, such a PR would likely not be accepted. Thank you for understanding.

glenn-jocher avatar Aug 26 '24 06:08 glenn-jocher

这段代码主要是针对我遇到的一个问题进行的优化。

简而言之,我的 CUDA 环境和代码运行正常,没有任何问题。但是,当我尝试在 Ultralytics 的 YOLOv8 之上通过引入 Mamba 架构相关模块进行架构改进时,CUDA 提示错误,无法继续运行。

具体错误信息如下所示。

图像

此优化代码通过增加判定使得其能够在 CUDA 上正确运行,从而避免了上述错误。

🛠️ PR摘要

由Ultralytics Actions用❤️制作

🌟 摘要

通过 CPU/GPU 回退处理增强步幅计算,以实现更好的硬件兼容性。

📊 关键变化

  • 🛠️ 添加了 try-except 块来处理RuntimeErrorCPU 上步幅计算期间的潜在问题。
  • 🚀 引入了一种回退机制,当 CPU 出现故障时,尝试在 CUDA(GPU)上进行步幅计算。
  • RuntimeError🐛如果 CPU 和 CUDA 同时失败,则正确引发捕获。

🎯 目的与影响

  • 🔧提高了稳健性:确保步幅计算不会在 CPU 上直接失败,并提供 GPU 回退。
  • 💻硬件兼容性:增强模型对不同硬件设置的适应性,确保无论可用资源如何,操作都能更顺畅。
  • 📈用户体验:减少设置过程中的潜在错误,为用户提供更无缝的工作流程。

Thank you very much! Your improvements are useful to me!

chenjie04 avatar Nov 08 '24 14:11 chenjie04

I'm glad to hear the improvements were helpful! If you have any more questions or need further assistance, feel free to ask.

glenn-jocher avatar Nov 08 '24 17:11 glenn-jocher