opencv_contrib
opencv_contrib copied to clipboard
disable npp in multistream context
Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [* ] I agree to contribute to the project under Apache 2 License.
- [* ] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [* ] The PR is proposed to the proper branch
- [* ] There is a reference to the original bug report and related work
- [*] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name.
- [*] The feature is well documented and sample code can be built with the project CMake
This PR is to match #3330
The issue is simple to reproduce a code from this popular tutorial can be used, just replace cuda::resize with cuda::warpAffine and some random matrix to accompany.
To my best knowledge is does not make sense to use npp with multistream context and simple warp implementation works just fine.
You can see using nppSetStream uses a lot of ioctl as was referenced in the issue and thus this function should only be used with the default context.
Alternatively there can be a static compilation warning in the NppWarp class, parametrized for calling it with StreamNull using static dispatch. I can probably try pulling that off, it may result in messier code, but the warning will be compile time.
@awarebayes Whilst I am no fan of NPP, I believe that they should offer the best performance going forward on newer architectures than the OpenCV hand rolled code which was optimized for hardware from 10 years ago. Now if all the OpenCV CUDA routines could be updated for new hardware and CUDA SDK features when they are released then I would change my mind but OpenCV is open source and the CUDA modules are in the contrib repository so this is not realistic.
With that out of the way would it not be better to replace the existing global stream versions (e.g. nppiWarpAffine_8u_C1R ) with the newer ones which accept and NppStreamContext (e.g. nppiWarpAffine_8u_C1R_ctx )? I ran a quick test on a 2000x2000 matrix using these changes and the performance of the npp and OpenCV hand rolled version are about the same. I think these changes and a #define for the CUDA version will be sufficient. Then if that works all of the npp routines could be updated to make propper use of streams.
A futher improvement might be to have an additional flag which lets the user decide if they want to use the NPP routines or not.
@cudawarped Should I be the one to do it? I.E. replace with the contexted version
@awarebayes Its your pull request, I'm mearly suggesting an alternative approach which in my opinion may be better, ultimately its up to the maintainers to decide. If you do apply the suggested changes make sure to include the #define to only update for CUDA toolkit >= 10.1 see the release notes .
First I would check that the changes I suggested are as quick as your change for your specific problem.
I did a compile time static dispatch for either _Ctx or non-_Ctx methods.
Tested, it performs not as good as opencv's because it uses cv::cuda::Mat::setTo to fill image with border color.
As for the details of the implemenation:
I basically check ,whether the library version supports _Ctx functions in a #define canUseContext
If it can, I concatenate _Ctx to the given function names in the function table. It it can't, I concatenate nothing to the names.
Then I need to issue a specific call to two different versions. I use static dispatch and Alexandrescu's Type2Var idiom with class templates to select the required version to call during the compile time.
So everything is static and should not affect the performance
You may not like me using concatenation macros, it can be avoided
@alalek I'm not great with templates is there a better way of implementing this change with them? I'm asking because if this change is suitable it could probably be applied to all the CUDA NPP routines.
Just out of interest how long does support need to be offered to older CUDA releases? NppStreamContext (_ctx) was added 3 years ago so if support only has to be for 3 years then the routines could be updated to just use the newer NppStreamContext versions instead of both?
@awarebayes where did you get to with this?
I finally had chance to take a look and I'm not sure if it makes sense to use templates in this case. I may be wrong but the code we are trying to template is not another type but historic/unsafe functions, which should probably be removed in future, and are only included for users who are unable to upgrade to the lates version of CUDA.
I think something similar to https://github.com/opencv/opencv_contrib/blob/ec9d7bca56a9f2645ecb96d9b51460b003e9b7a5/modules/cudawarping/src/warp.cpp#L152 would suffice, what do you think?
I am pro removing old code and just going along with the new version.
Ifdefs can solve anything but those are just macros and do not provide static dispatch
There is an obvious problem for most people, using new versions of NPP, if it breaks older code, welp :(
Ifdefs can solve anything but those are just macros and do not provide static dispatch
Can you explain futher what you mean? Both the templated and the #define version have one code path determined at compile time based on the version of NPP being used. There is currently no flag to choose the older unsafe version.
There is an obvious problem for most people, using new versions of NPP, if it breaks older code, welp :(
This could be an issue but then isn't that as an issue a user may/can/does unfortunately face every time they change the version of the CUDA SDK they are using?