opencv_contrib icon indicating copy to clipboard operation
opencv_contrib copied to clipboard

Bug in cv::cuda::warpPerspective

Open andrejlevkovitch opened this issue 6 years ago • 13 comments

System information (version)
  • OpenCV => 4.1.0
  • Operating System / Platform => debian10
  • Compiler => g++-8.3.0
  • Cuda => cuda-10.1
Detailed description

With some matricies (perspective transformation) I get exception about illegal memory access in GPU memory:

OpenCV(4.1.0) /tmp/opencv_dir/modules/core/src/cuda_stream.cpp:473: error: (-217:Gpu API call) an illegal memory access was encountered in function 'waitForCompletion'

This happens only if I use not default stream and linear interpolation.

Steps to reproduce

Tested with cuda-10.1 and opencv-4.1.0.

// main.cpp

#include <cstdlib>
#include <opencv2/cudawarping.hpp>
#include <vector>

int main() {
  cv::Size inputSize{1000, 1000}; // can be different

  cv::Size outSize{670, 893};
  std::vector<double> values{
      0.3154414342519155,    0.2158186507123098,    133.1013012869572,
      -0.01298153793321658,  0.5681477441984867,    297.7284881948252,
      1.839213382549264e-05, 0.0006366310508410327, 1};

  cv::Mat matrix{cv::Size{3, 3}, CV_64FC1, values.data()};

  cv::cuda::GpuMat image{inputSize, CV_8UC4};

  cv::cuda::Stream stream;
  cv::cuda::warpPerspective(image, image, matrix, outSize, cv::INTER_LINEAR,
                            cv::BORDER_CONSTANT, cv::Scalar{0, 0, 0, 0},
                            stream);
  stream.waitForCompletion();

  return EXIT_SUCCESS;
}
# cmake

cmake_minimum_required(VERSION 3.12)

project(min_ex)

find_package(OpenCV COMPONENTS core cudawarping REQUIRED)

add_executable(${PROJECT_NAME} main.cpp)
target_compile_features(${PROJECT_NAME} PRIVATE cxx_std_11)
target_link_libraries(${PROJECT_NAME} PRIVATE ${OpenCV_LIBS})
target_include_directories(${PROJECT_NAME} PRIVATE ${OpenCV_INCLUDE_DIRS})

andrejlevkovitch avatar Nov 25 '19 07:11 andrejlevkovitch

Same on 3.4 branch.

nglee avatar Dec 29 '19 23:12 nglee

Testing with CUDA8.0 and CUDA10.1, both cases seem to have this bug. GTX 1080 Ti

nglee avatar Aug 28 '20 07:08 nglee

I simplify example to:

// main.cpp

#include <cstdlib>
#include <iostream>
#include <opencv2/cudawarping.hpp>
#include <vector>

int main() {
  cv::Size inputSize{1, 1};
  cv::Size outSize{1, 1};

  // clang-format off
  std::vector<float> values{
      0, 0, 0,
      0, 0, 2147483648, // XXX after converting to int we get -2147483648
  };
  // clang-format on

  cv::Mat matrix{cv::Size{3, 2}, CV_32FC1, values.data()};

  cv::cuda::GpuMat image{inputSize, CV_8UC1};
  cv::cuda::GpuMat output;

  cv::cuda::Stream stream;
  cv::cuda::warpAffine(image, output, matrix, outSize,
                       cv::INTER_LINEAR | cv::WARP_INVERSE_MAP,
                       cv::BORDER_CONSTANT, cv::Scalar{0}, stream);
  stream.waitForCompletion();

  return EXIT_SUCCESS;
}

Also I recompile opencv with nvcc flag -lineinfo and run this example by cuda-memcheck. Here is a output:

OpenCV(4.1.0) Error: Gpu API call (unspecified launch failure) in waitForCompletion, file /home/user/tmp/opencv_dir/modules/core/src/cuda_stream.cpp, line 473
terminate called after throwing an instance of 'cv::Exception'
  what():  OpenCV(4.1.0) /home/user/tmp/opencv_dir/modules/core/src/cuda_stream.cpp:473: error: (-217:Gpu API call) unspecified launch failure in function 'waitForCompletion'

========= CUDA-MEMCHECK
========= Invalid __global__ read of size 1
=========     at 0x00000350 in /home/user/tmp/opencv_dir/modules/core/include/opencv2/core/cuda/filters.hpp:97:void cv::cuda::device::imgproc::warp<cv::cuda::device::imgproc::AffineTransform, cv::cuda::device::LinearFilter<cv::cuda::device::BorderReader<cv::cuda::PtrStep<unsigned char>, cv::cuda::device::BrdConstant<float>>>, unsigned char>(unsigned char, cv::cuda::PtrStepSz<cv::cuda::PtrStep<unsigned char>>)
=========     by thread (0,0,0) in block (0,0,0)
=========     Address 0x7fdb9c600000 is out of bounds
=========     Device Frame:/home/user/tmp/opencv_dir/modules/core/include/opencv2/core/cuda/filters.hpp:97:void cv::cuda::device::imgproc::warp<cv::cuda::device::imgproc::AffineTransform, cv::cuda::device::LinearFilter<cv::cuda::device::BorderReader<cv::cuda::PtrStep<unsigned char>, cv::cuda::device::BrdConstant<float>>>, unsigned char>(unsigned char, cv::cuda::PtrStepSz<cv::cuda::PtrStep<unsigned char>>) (void cv::cuda::device::imgproc::warp<cv::cuda::device::imgproc::AffineTransform, cv::cuda::device::LinearFilter<cv::cuda::device::BorderReader<cv::cuda::PtrStep<unsigned char>, cv::cuda::device::BrdConstant<float>>>, unsigned char>(unsigned char, cv::cuda::PtrStepSz<cv::cuda::PtrStep<unsigned char>>) : 0x350)
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame:/lib/x86_64-linux-gnu/libcuda.so.1 (cuLaunchKernel + 0x2fe) [0x282a4e]
=========     Host Frame:/home/user/tmp/opencv_dir/build/lib/libopencv_cudawarping.so.4.1 [0x6ccc19]
=========     Host Frame:/home/user/tmp/opencv_dir/build/lib/libopencv_cudawarping.so.4.1 [0x6ccca7]
=========     Host Frame:/home/user/tmp/opencv_dir/build/lib/libopencv_cudawarping.so.4.1 [0x702ff5]
=========     Host Frame:/home/user/tmp/opencv_dir/build/lib/libopencv_cudawarping.so.4.1 [0x1451b6]
=========     Host Frame:/home/user/tmp/opencv_dir/build/lib/libopencv_cudawarping.so.4.1 [0x105e5e]
=========     Host Frame:/home/user/tmp/opencv_dir/build/lib/libopencv_cudawarping.so.4.1 [0x105e8f]
=========     Host Frame:/home/user/tmp/opencv_dir/build/lib/libopencv_cudawarping.so.4.1 [0x145430]
=========     Host Frame:/home/user/tmp/opencv_dir/build/lib/libopencv_cudawarping.so.4.1 [0x15d267]
=========     Host Frame:/home/user/tmp/opencv_dir/build/lib/libopencv_cudawarping.so.4.1 [0x14d032]
=========     Host Frame:/home/user/tmp/opencv_dir/build/lib/libopencv_cudawarping.so.4.1 [0x14b4d4]
=========     Host Frame:/home/user/tmp/opencv_dir/build/lib/libopencv_cudawarping.so.4.1 [0x10365f]
=========     Host Frame:/home/user/tmp/opencv_dir/build/lib/libopencv_cudawarping.so.4.1 (_ZN2cv4cuda10warpAffineERKNS_11_InputArrayERKNS_12_OutputArrayES3_NS_5Size_IiEEiiNS_7Scalar_IdEERNS0_6StreamE + 0x8e9) [0x62ca8]
=========     Host Frame:min_ex [0x2469]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xeb) [0x2409b]
=========     Host Frame:min_ex [0x21ba]
=========
========= Program hit cudaErrorLaunchFailure (error 719) due to "unspecified launch failure" on CUDA API call to cudaStreamSynchronize. 
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/lib/x86_64-linux-gnu/libcuda.so.1 [0x391b13]
=========     Host Frame:/home/user/tmp/opencv_dir/build/lib/libopencv_core.so.4.1 [0xeea61e]
=========     Host Frame:/home/user/tmp/opencv_dir/build/lib/libopencv_core.so.4.1 (_ZN2cv4cuda6Stream17waitForCompletionEv + 0x21) [0x21e61f]
=========     Host Frame:min_ex [0x24a0]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xeb) [0x2409b]
=========     Host Frame:min_ex [0x21ba]
=========
========= Error: process didn't terminate successfully
========= No CUDA-MEMCHECK results found

cuda/filters.hpp:97 is just:

const int y1 = __float2int_rd(y);

from device method of LinearFilter::operator().

andrejlevkovitch avatar Jan 14 '21 09:01 andrejlevkovitch

So, I think, problem in integer overflow. See on our transformation matrix - if we change our y-move value to 2147483583., then all will works fine.

andrejlevkovitch avatar Jan 14 '21 10:01 andrejlevkovitch

Yes, problem is in integer overflow. Integer overflow is UB on cuda (like in c++ standard), so I just try change string that caused error from const int y1 = __float2int_rd(y); to const unsigned int y1 = __float2uint_rd(y); and after it I can't reproduce my error anymore. So, it looks like using signed integer values in cuda device functions is really bad idea.

andrejlevkovitch avatar Jan 14 '21 13:01 andrejlevkovitch

#2712 also not reproduced after previous fix

andrejlevkovitch avatar Jan 14 '21 13:01 andrejlevkovitch

Unfortunately I can't reproduce this error in some separate cuda function

andrejlevkovitch avatar Jan 15 '21 09:01 andrejlevkovitch

Affine transformation supports three types of interpolation: nearest, linear and cubic - but problem reproduce only with INTER_LINEAR and with not default stream only. This is very strange, because function __float2int_rd uses in cubic filter also.

andrejlevkovitch avatar Jan 15 '21 10:01 andrejlevkovitch

Hey ! I have an error with INTER_LINEAR type also, with NEAREST everything is working fine. How to solve overflow in matrix in this code? Trying to solve this problem for 2 days , no luck (. Thank you.

Mat CudaWarp(Mat aImgOriginal, Mat matrix, float w, float h) {
	Mat output;
	cv::cuda::GpuMat gpuOutput;
	cv::cuda::GpuMat  gpuInput;

	gpuInput.upload(aImgOriginal);

	cv::cuda::warpPerspective(gpuInput, gpuOutput, matrix, Point(w, h), INTER_LINEAR, BORDER_CONSTANT,0);

	gpuOutput.download(output);


	return output;

}

thilipwka avatar Sep 09 '21 20:09 thilipwka

@thilipwka please, see my previous comments. Currently you can't fix this problem, because problem is very deep in cuda code

andrejlevkovitch avatar Sep 10 '21 16:09 andrejlevkovitch

@thilipwka please, see my previous comments. Currently you can't fix this problem, because problem is very deep in cuda code

thank you

thilipwka avatar Sep 10 '21 18:09 thilipwka

I have the same problem

cv::cuda::remap(inputGPU, inputGPU, M1Gpu, M2Gpu, cv::INTER_LINEAR);

but it solved when I pass different gpu matrix for input and output

cv::cuda::remap(inputGPU, outputGPU, M1Gpu, M2Gpu, cv::INTER_LINEAR);

RamadanAhmed avatar Mar 08 '22 11:03 RamadanAhmed

@andrejlevkovitch Is there some other way of doing a remap? A workaround?

Emmanuel-Messulam avatar May 30 '22 16:05 Emmanuel-Messulam