opencv_contrib Try to Fix #3725: cudaarithm: fix the compile faiure of CUDA 12.

A slight API change of nppiMeanStdDevGetBufferHostSize_8u_C1R and nppiMeanStdDevGetBufferHostSize_32f_C1R in NPP of CUDA 12 has caused the #3725. I will try to fix this. I found that the type of bufSize is size_t instead of int in the reductions.cpp in the NPP header file, where the nppi_statistics_functions.h changed the type of second parameter from * int to * size_t.

nppi_statistics_functions.h 5392:5408

NppStatus 
nppiMeanStdDevGetBufferHostSize_32f_C1R_Ctx(NppiSize oSizeROI, size_t * hpBufferSize/* host pointer */, NppStreamContext nppStreamCtx);
/** 
 * Buffer size for \ref nppiMean_StdDev_32f_C1R.
 * 
 * For common parameter descriptions, see \ref CommonMeanStdDevGetBufferHostSizeParameters.
 *
 */
NppStatus 
nppiMeanStdDevGetBufferHostSize_32f_C1R(NppiSize oSizeROI, size_t * hpBufferSize/* host pointer */);

/** 
 * Buffer size for \ref nppiMean_StdDev_8u_C1MR.
 * 
 * For common parameter descriptions, see \ref CommonMeanStdDevGetBufferHostSizeParameters.
 *
 */

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

[x] I agree to contribute to the project under Apache 2 License.
[x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
[x] The PR is proposed to the proper branch
[x] There is a reference to the original bug report and related work
[ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name.
[ ] The feature is well documented and sample code can be built with the project CMake

Apr 22 '24 15:04 sdy623

/cc @cudawarped

Apr 22 '24 16:04 opencv-alalek

I checked the denfition of CUDA_VERSION in CUDA 12 which is the 12.XX.XX instad of a int number

@sdy623

I think you are confusing CMake generation and compilation. Adding that definition which is still for the incorrect verison of CUDA into the CMake file is unecessary.

Version Info

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\include\cuda.h

> /**
>  * CUDA API version number
>  */
> #define CUDA_VERSION 12040

Function definitions

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\include\nppi_statistics_functions.h

NppStatus 
nppiMeanStdDevGetBufferHostSize_8u_C1R_Ctx(NppiSize oSizeROI, int * hpBufferSize/* host pointer */, NppStreamContext nppStreamCtx);

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include\nppi_statistics_functions.h

NppStatus 
nppiMeanStdDevGetBufferHostSize_8u_C1R_Ctx(NppiSize oSizeROI, size_t * hpBufferSize/* host pointer */, NppStreamContext nppStreamCtx);

Apr 23 '24 05:04 cudawarped

<int &, int &>, int, cuda::std::__4::tuple<cv::cudev::minimum, cv::cudev::maximum>) blockReduce<BLOCK_SIZE>(smem_tuple(sminval, smaxval),

I checked the denfition of CUDA_VERSION in CUDA 12 which is the 12.XX.XX instad of a int number

@sdy623

I think you are confusing CMake generation and compilation. Adding that definition which is still for the incorrect verison of CUDA into the CMake file is unecessary.

Version Info

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\include\cuda.h
> /**
>  * CUDA API version number
>  */
> #define CUDA_VERSION 12040
Function definitions

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\include\nppi_statistics_functions.h
NppStatus 
nppiMeanStdDevGetBufferHostSize_8u_C1R_Ctx(NppiSize oSizeROI, int * hpBufferSize/* host pointer */, NppStreamContext nppStreamCtx);
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include\nppi_statistics_functions.h
NppStatus 
nppiMeanStdDevGetBufferHostSize_8u_C1R_Ctx(NppiSize oSizeROI, size_t * hpBufferSize/* host pointer */, NppStreamContext nppStreamCtx);

I deleted the edit of the CMake and find the correct CUDA_VERSION is 12040 in CUDA 12.4. I found a way to check the cuda.h without install it, just use 7-Zip to open the cuda_12.4.1_551.78_windows.exe and find this directory cuda_12.4.1_551.78_windows.exe\cuda_cudart\cudart\include\cuda.h and found the macro denfination

/**
 * CUDA API version number
 */
#define CUDA_VERSION 12040

Apr 23 '24 05:04 sdy623

I found a way to check the cuda.h without install it, just use 7-Zip to open the cuda_12.4.1_551.78_windows.exe and find this directory cuda_12.4.1_551.78_windows.exe\cuda_cudart\cudart\include\cuda.h and found the macro denfination

Whilst this is a completely valid way to check the header I would advise you to install CUDA 12.4 when submitting a PR which fixes something that it breaks.

If you do that you will realize that

Your conversions from size_t to int are throwing warnings

warning C4267: 'argument': conversion from 'size_t' to 'int', possible loss of data

which you should address when using the new buffer size type as I mentioned above. e.g.
```
CV_Assert(bufSize <= std::numeric_limits<int>::max());
GpuMat buf = pool.getBuffer(1, static_cast<int>(bufSize), gsrc.type());
```
Because of the existing bug I mentioned before https://github.com/opencv/opencv_contrib/issues/3690 you will be unable to build the cudaarithm module even with this fix. i.e.

cmake --build . --target opencv_cudaarithm

still fails.

If I was authoring this PR I would install both CUDA 12.3 and 12.4 and check that this builds on both without errors.

Apr 23 '24 07:04 cudawarped

@opencv-alalek this resolves the issue mentioned but still results in build errors because cudaarithm uses cudev ( https://github.com/opencv/opencv_contrib/issues/3690). I suggest this is revisited when cudev can be built against CUDA 12.4.

Apr 23 '24 18:04 cudawarped

I still have an issue https://github.com/opencv/opencv_contrib/issues/3728

Apr 28 '24 22:04 jiapei100

I still have an issue #3728

@jiapei100 Your issue is related to cudev (https://github.com/opencv/opencv_contrib/issues/3690) not cudaarithm.

Apr 29 '24 04:04 cudawarped

great job! thank you

May 09 '24 22:05 johnnynunez

opencv_contrib opencv_contrib copied to clipboard

Try to Fix #3725: cudaarithm: fix the compile faiure of CUDA 12.

Pull Request Readiness Checklist

opencv_contrib
opencv_contrib copied to clipboard