Improved a little piece of code that could cause errors related to CUDA and Mamba modules.
This code is mainly an optimisation for a problem I had.
In short, my CUDA environment and the code worked fine with no problems. However, when I try to make architectural improvements on top of Ultralytics' YOLOv8 by introducing Mamba architecture related modules, I am prompted with an error in CUDA and cannot continue to run.
The specific error message is shown below.
This optimised code avoids the above error by adding a determination to enable it to run correctly on CUDA.
🛠️ PR Summary
Made with ❤️ by Ultralytics Actions
🌟 Summary
Enhanced stride computation with CPU/GPU fallback handling for better hardware compatibility.
📊 Key Changes
- 🛠️ Added try-except blocks to handle potential
RuntimeErrorduring stride computation on CPU. - 🚀 Introduced a fallback mechanism to attempt stride computation on CUDA (GPU) if CPU fails.
- 🐛 Properly raise the caught
RuntimeErrorif both CPU and CUDA fail.
🎯 Purpose & Impact
- 🔧 Improved Robustness: Ensures stride computation doesn't fail outright on CPU, offering fallback to GPU.
- 💻 Hardware Compatibility: Enhances the model's adaptability to different hardware setups, ensuring smoother operations regardless of available resources.
- 📈 User Experience: Reduces potential errors during setup, facilitating a more seamless workflow for users.
All Contributors have signed the CLA. ✅
Posted by the CLA Assistant Lite bot.
I have read the CLA Document and I sign the CLA
@Tdzdele is there a device mismatch here, i.e. is the model on CUDA and the input tensor on CPU?
Would it be simpler to simply convert the input tensor to the model device always on this line?
@Tdzdele I reviewed the code here and I don't see any opportunity for any modules to be on CUDA at the time of this line. If you are introducing CUDA modules here then I would advise you to convert them to CPU first to join the other CPU modules.
@glenn-jocher Yes, the original code of the ultralytics framework does not cause problems here. It is mainly when trying to introduce external modules into the model of the ultralytics framework that some errors may be reported due to the external modules themselves, such as Mamba related modules or Deformable Convolution v4 (DCNv4) related modules. With this modification, some potential problems can be avoided simply and effectively, and I think this can improve the compatibility of the ultralytics framework, lower some development thresholds, and make it easier for more developers to make more integration improvements based on the ultralytics framework.
@Tdzdele yes I see that, but try except blocks are not good practice as they don't actually address the root of the problem.
If the model is on a given device and the input is on a another that should be solvable without resorting to try except:
device = # code to get model device here
m.stride = torch.tensor([s / x.shape[-2] for x in _forward(torch.zeros((1, ch, s, s), device=device))])
If the issue is that the user is introducing modules that are mixed CPU and GPU that is user error and it is their responsibility to address that.
@glenn-jocher I got it, thanks very much for comments!
You're welcome! 😊 If you encounter any further issues or have additional questions, feel free to reach out. Also, if you haven't already, please ensure you're using the latest versions of the packages to benefit from recent fixes and improvements. For any reproducible examples, you can refer to our minimum reproducible example guide. This helps us diagnose and resolve issues more effectively. Happy coding! 🚀
I'm curious how you are trying to use selective scan to improve Ultralytics with Mamba. Are you trying to make a Mamba-based object detector?
Thank you for your interest! It sounds like you're exploring innovative ways to enhance Ultralytics with Mamba. If you're integrating Mamba-based modules or selective scan techniques, ensuring compatibility with the latest versions of our packages is crucial. This can help avoid potential issues and leverage recent improvements. If you encounter specific errors or need further assistance, please provide more details, and we'll be happy to help.
Doesn't the mamba-ssm install require some complex CUDA setup/configurations? Maybe that's not what you're using @Tdzdele but I was looking at this the other day, and that seemed to be a requirement that would make integration difficult. I bring it up as maybe this is semi-related to your issue, but also b/c if you're planning to open a PR later to integrate mamba-ssm (the fact you opened this PR is the basis for this assumption) that would require users to jump through a lot of hoops to enable, I figured it would be better to let you know now that it has a very low probability of getting accepted. Reference
I'm curious how you are trying to use selective scan to improve Ultralytics with Mamba. Are you trying to make a Mamba-based object detector?
@rolson24 Yes, previously I tried to fuse some of the structures in Mamba with the YOLO family. However, after my attempts, although such an operation may bring some accuracy improvement, due to the complexity of the structures in Mamba, fusing them into the YOLO architecture will drastically reduce the inference speed of the model, and is more likely to lead to some bugs, which I think is a difficult research direction
Doesn't the
mamba-ssminstall require some complex CUDA setup/configurations? Maybe that's not what you're using @Tdzdele but I was looking at this the other day, and that seemed to be a requirement that would make integration difficult. I bring it up as maybe this is semi-related to your issue, but also b/c if you're planning to open a PR later to integratemamba-ssm(the fact you opened this PR is the basis for this assumption) that would require users to jump through a lot of hoops to enable, I figured it would be better to let you know now that it has a very low probability of getting accepted. Reference
Okay. Thank you very much.
Yes, integrating mamba-ssm can indeed involve complex CUDA configurations, which might complicate the setup for many users. If you're considering a PR for this integration, please be aware that such complexity could hinder its acceptance. We aim to maintain ease of use and broad compatibility in our framework. If you encounter specific issues or have further questions, feel free to share more details, and we'll do our best to assist you. Thank you for your understanding.
Codecov Report
Attention: Patch coverage is 25.00000% with 6 lines in your changes missing coverage. Please review.
Project coverage is 73.98%. Comparing base (
dbdb451) to head (060dd02). Report is 221 commits behind head on main.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| ultralytics/nn/tasks.py | 25.00% | 6 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## main #14371 +/- ##
==========================================
+ Coverage 73.97% 73.98% +0.01%
==========================================
Files 128 128
Lines 17118 17125 +7
==========================================
+ Hits 12663 12670 +7
Misses 4455 4455
| Flag | Coverage Δ | |
|---|---|---|
| Benchmarks | 34.98% <0.00%> (+0.02%) |
:arrow_up: |
| GPU | 39.07% <25.00%> (-0.02%) |
:arrow_down: |
| Tests | 67.73% <25.00%> (-0.03%) |
:arrow_down: |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
You're correct, integrating mamba-ssm would indeed require complex CUDA configurations, which could complicate the setup for users. Given these challenges, such a PR would likely not be accepted. Thank you for understanding.
这段代码主要是针对我遇到的一个问题进行的优化。
简而言之,我的 CUDA 环境和代码运行正常,没有任何问题。但是,当我尝试在 Ultralytics 的 YOLOv8 之上通过引入 Mamba 架构相关模块进行架构改进时,CUDA 提示错误,无法继续运行。
具体错误信息如下所示。
此优化代码通过增加判定使得其能够在 CUDA 上正确运行,从而避免了上述错误。
🛠️ PR摘要
由Ultralytics Actions用❤️制作
🌟 摘要
通过 CPU/GPU 回退处理增强步幅计算,以实现更好的硬件兼容性。
📊 关键变化
- 🛠️ 添加了 try-except 块来处理
RuntimeErrorCPU 上步幅计算期间的潜在问题。- 🚀 引入了一种回退机制,当 CPU 出现故障时,尝试在 CUDA(GPU)上进行步幅计算。
RuntimeError🐛如果 CPU 和 CUDA 同时失败,则正确引发捕获。🎯 目的与影响
- 🔧提高了稳健性:确保步幅计算不会在 CPU 上直接失败,并提供 GPU 回退。
- 💻硬件兼容性:增强模型对不同硬件设置的适应性,确保无论可用资源如何,操作都能更顺畅。
- 📈用户体验:减少设置过程中的潜在错误,为用户提供更无缝的工作流程。
Thank you very much! Your improvements are useful to me!
I'm glad to hear the improvements were helpful! If you have any more questions or need further assistance, feel free to ask.
