opencv_contrib
opencv_contrib copied to clipboard
Bug in cv::cuda::warpPerspective
System information (version)
- OpenCV =>
4.1.0 - Operating System / Platform =>
debian10 - Compiler =>
g++-8.3.0 - Cuda =>
cuda-10.1
Detailed description
With some matricies (perspective transformation) I get exception about illegal memory access in GPU memory:
OpenCV(4.1.0) /tmp/opencv_dir/modules/core/src/cuda_stream.cpp:473: error: (-217:Gpu API call) an illegal memory access was encountered in function 'waitForCompletion'
This happens only if I use not default stream and linear interpolation.
Steps to reproduce
Tested with cuda-10.1 and opencv-4.1.0.
// main.cpp
#include <cstdlib>
#include <opencv2/cudawarping.hpp>
#include <vector>
int main() {
cv::Size inputSize{1000, 1000}; // can be different
cv::Size outSize{670, 893};
std::vector<double> values{
0.3154414342519155, 0.2158186507123098, 133.1013012869572,
-0.01298153793321658, 0.5681477441984867, 297.7284881948252,
1.839213382549264e-05, 0.0006366310508410327, 1};
cv::Mat matrix{cv::Size{3, 3}, CV_64FC1, values.data()};
cv::cuda::GpuMat image{inputSize, CV_8UC4};
cv::cuda::Stream stream;
cv::cuda::warpPerspective(image, image, matrix, outSize, cv::INTER_LINEAR,
cv::BORDER_CONSTANT, cv::Scalar{0, 0, 0, 0},
stream);
stream.waitForCompletion();
return EXIT_SUCCESS;
}
# cmake
cmake_minimum_required(VERSION 3.12)
project(min_ex)
find_package(OpenCV COMPONENTS core cudawarping REQUIRED)
add_executable(${PROJECT_NAME} main.cpp)
target_compile_features(${PROJECT_NAME} PRIVATE cxx_std_11)
target_link_libraries(${PROJECT_NAME} PRIVATE ${OpenCV_LIBS})
target_include_directories(${PROJECT_NAME} PRIVATE ${OpenCV_INCLUDE_DIRS})
Same on 3.4 branch.
Testing with CUDA8.0 and CUDA10.1, both cases seem to have this bug. GTX 1080 Ti
I simplify example to:
// main.cpp
#include <cstdlib>
#include <iostream>
#include <opencv2/cudawarping.hpp>
#include <vector>
int main() {
cv::Size inputSize{1, 1};
cv::Size outSize{1, 1};
// clang-format off
std::vector<float> values{
0, 0, 0,
0, 0, 2147483648, // XXX after converting to int we get -2147483648
};
// clang-format on
cv::Mat matrix{cv::Size{3, 2}, CV_32FC1, values.data()};
cv::cuda::GpuMat image{inputSize, CV_8UC1};
cv::cuda::GpuMat output;
cv::cuda::Stream stream;
cv::cuda::warpAffine(image, output, matrix, outSize,
cv::INTER_LINEAR | cv::WARP_INVERSE_MAP,
cv::BORDER_CONSTANT, cv::Scalar{0}, stream);
stream.waitForCompletion();
return EXIT_SUCCESS;
}
Also I recompile opencv with nvcc flag -lineinfo and run this example by cuda-memcheck. Here is a output:
OpenCV(4.1.0) Error: Gpu API call (unspecified launch failure) in waitForCompletion, file /home/user/tmp/opencv_dir/modules/core/src/cuda_stream.cpp, line 473
terminate called after throwing an instance of 'cv::Exception'
what(): OpenCV(4.1.0) /home/user/tmp/opencv_dir/modules/core/src/cuda_stream.cpp:473: error: (-217:Gpu API call) unspecified launch failure in function 'waitForCompletion'
========= CUDA-MEMCHECK
========= Invalid __global__ read of size 1
========= at 0x00000350 in /home/user/tmp/opencv_dir/modules/core/include/opencv2/core/cuda/filters.hpp:97:void cv::cuda::device::imgproc::warp<cv::cuda::device::imgproc::AffineTransform, cv::cuda::device::LinearFilter<cv::cuda::device::BorderReader<cv::cuda::PtrStep<unsigned char>, cv::cuda::device::BrdConstant<float>>>, unsigned char>(unsigned char, cv::cuda::PtrStepSz<cv::cuda::PtrStep<unsigned char>>)
========= by thread (0,0,0) in block (0,0,0)
========= Address 0x7fdb9c600000 is out of bounds
========= Device Frame:/home/user/tmp/opencv_dir/modules/core/include/opencv2/core/cuda/filters.hpp:97:void cv::cuda::device::imgproc::warp<cv::cuda::device::imgproc::AffineTransform, cv::cuda::device::LinearFilter<cv::cuda::device::BorderReader<cv::cuda::PtrStep<unsigned char>, cv::cuda::device::BrdConstant<float>>>, unsigned char>(unsigned char, cv::cuda::PtrStepSz<cv::cuda::PtrStep<unsigned char>>) (void cv::cuda::device::imgproc::warp<cv::cuda::device::imgproc::AffineTransform, cv::cuda::device::LinearFilter<cv::cuda::device::BorderReader<cv::cuda::PtrStep<unsigned char>, cv::cuda::device::BrdConstant<float>>>, unsigned char>(unsigned char, cv::cuda::PtrStepSz<cv::cuda::PtrStep<unsigned char>>) : 0x350)
========= Saved host backtrace up to driver entry point at kernel launch time
========= Host Frame:/lib/x86_64-linux-gnu/libcuda.so.1 (cuLaunchKernel + 0x2fe) [0x282a4e]
========= Host Frame:/home/user/tmp/opencv_dir/build/lib/libopencv_cudawarping.so.4.1 [0x6ccc19]
========= Host Frame:/home/user/tmp/opencv_dir/build/lib/libopencv_cudawarping.so.4.1 [0x6ccca7]
========= Host Frame:/home/user/tmp/opencv_dir/build/lib/libopencv_cudawarping.so.4.1 [0x702ff5]
========= Host Frame:/home/user/tmp/opencv_dir/build/lib/libopencv_cudawarping.so.4.1 [0x1451b6]
========= Host Frame:/home/user/tmp/opencv_dir/build/lib/libopencv_cudawarping.so.4.1 [0x105e5e]
========= Host Frame:/home/user/tmp/opencv_dir/build/lib/libopencv_cudawarping.so.4.1 [0x105e8f]
========= Host Frame:/home/user/tmp/opencv_dir/build/lib/libopencv_cudawarping.so.4.1 [0x145430]
========= Host Frame:/home/user/tmp/opencv_dir/build/lib/libopencv_cudawarping.so.4.1 [0x15d267]
========= Host Frame:/home/user/tmp/opencv_dir/build/lib/libopencv_cudawarping.so.4.1 [0x14d032]
========= Host Frame:/home/user/tmp/opencv_dir/build/lib/libopencv_cudawarping.so.4.1 [0x14b4d4]
========= Host Frame:/home/user/tmp/opencv_dir/build/lib/libopencv_cudawarping.so.4.1 [0x10365f]
========= Host Frame:/home/user/tmp/opencv_dir/build/lib/libopencv_cudawarping.so.4.1 (_ZN2cv4cuda10warpAffineERKNS_11_InputArrayERKNS_12_OutputArrayES3_NS_5Size_IiEEiiNS_7Scalar_IdEERNS0_6StreamE + 0x8e9) [0x62ca8]
========= Host Frame:min_ex [0x2469]
========= Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xeb) [0x2409b]
========= Host Frame:min_ex [0x21ba]
=========
========= Program hit cudaErrorLaunchFailure (error 719) due to "unspecified launch failure" on CUDA API call to cudaStreamSynchronize.
========= Saved host backtrace up to driver entry point at error
========= Host Frame:/lib/x86_64-linux-gnu/libcuda.so.1 [0x391b13]
========= Host Frame:/home/user/tmp/opencv_dir/build/lib/libopencv_core.so.4.1 [0xeea61e]
========= Host Frame:/home/user/tmp/opencv_dir/build/lib/libopencv_core.so.4.1 (_ZN2cv4cuda6Stream17waitForCompletionEv + 0x21) [0x21e61f]
========= Host Frame:min_ex [0x24a0]
========= Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xeb) [0x2409b]
========= Host Frame:min_ex [0x21ba]
=========
========= Error: process didn't terminate successfully
========= No CUDA-MEMCHECK results found
cuda/filters.hpp:97 is just:
const int y1 = __float2int_rd(y);
from device method of LinearFilter::operator().
So, I think, problem in integer overflow. See on our transformation matrix - if we change our y-move value to 2147483583., then all will works fine.
Yes, problem is in integer overflow. Integer overflow is UB on cuda (like in c++ standard), so I just try change string that caused error from const int y1 = __float2int_rd(y); to const unsigned int y1 = __float2uint_rd(y); and after it I can't reproduce my error anymore. So, it looks like using signed integer values in cuda device functions is really bad idea.
#2712 also not reproduced after previous fix
Unfortunately I can't reproduce this error in some separate cuda function
Affine transformation supports three types of interpolation: nearest, linear and cubic - but problem reproduce only with INTER_LINEAR and with not default stream only. This is very strange, because function __float2int_rd uses in cubic filter also.
Hey ! I have an error with INTER_LINEAR type also, with NEAREST everything is working fine. How to solve overflow in matrix in this code? Trying to solve this problem for 2 days , no luck (. Thank you.
Mat CudaWarp(Mat aImgOriginal, Mat matrix, float w, float h) {
Mat output;
cv::cuda::GpuMat gpuOutput;
cv::cuda::GpuMat gpuInput;
gpuInput.upload(aImgOriginal);
cv::cuda::warpPerspective(gpuInput, gpuOutput, matrix, Point(w, h), INTER_LINEAR, BORDER_CONSTANT,0);
gpuOutput.download(output);
return output;
}
@thilipwka please, see my previous comments. Currently you can't fix this problem, because problem is very deep in cuda code
@thilipwka please, see my previous comments. Currently you can't fix this problem, because problem is very deep in cuda code
thank you
I have the same problem
cv::cuda::remap(inputGPU, inputGPU, M1Gpu, M2Gpu, cv::INTER_LINEAR);
but it solved when I pass different gpu matrix for input and output
cv::cuda::remap(inputGPU, outputGPU, M1Gpu, M2Gpu, cv::INTER_LINEAR);
@andrejlevkovitch Is there some other way of doing a remap? A workaround?