CudaSift icon indicating copy to clipboard operation
CudaSift copied to clipboard

CudaImage allocate error in my project (segmentation fault)

Open richard-elvira opened this issue 2 years ago • 1 comments

Hi, i am working with this library, and I am already running the examples without problem.

My main problem is to integrate CudaSift in my project, I have add the CudaSIFT code and run well, but problem comes when I try to allocate an image in the CudaImage container where system launch a segmentation fault inside of allocate function.

I send you the code to see if there is something I am missing:

InitCuda(0); // Initialize with the device 0 (from all the devices with CUDA)

if(img.empty())
{
    std::cerr << "Empty image in CudaSiftHandler::extractCudaSift" << std::endl;
}

cv::Mat img_32f;
img.convertTo(img_32f, CV_32FC1, 1/255.0);

unsigned int w = img_32f.cols;
unsigned int h = img_32f.rows;
std::cout << "img_32f cols: " << w << "; rows: " << h << "; channels: " << img_32f.channels() << std::endl;
std::cout << "img_32f type: " << img_32f.type() << std::endl;

mCudaImg.Allocate(w, h, w, false, static_cast<float*>(NULL), (float*)img_32f.data);
mCudaImg.Download();

InitSiftData(mSiftDataImgExt, mnMaxFeatures, true, true);

ExtractSift(mSiftDataImgExt, mCudaImg, mnNumOctaves, mfInitBlur, mfThreshold, mnMinScale, false);
std::cout << "There are " << mSiftDataImgExt.numPts << " sift points detected by GPU" << std::endl;

The output of my code before of the segmentation fault shows that InitCuda has initialized the graphic card correctly, the image is not empty and has the correct format (float):

Device Number: 0 Device name: NVIDIA TITAN Xp Memory Clock Rate (MHz): 5705 Memory Bus Width (bits): 384 Peak Memory Bandwidth (GB/s): 547.7

img_32f cols: 1440; rows: 1080; channels: 1 img_32f type: 5

Disassembler (at the beginning of Allocate code, in line 5):

0x5555555644f0                  f3 0f 1e fa                       endbr64
0x5555555644f4  <+    4>        53                                push   %rbx
0x5555555644f5  <+    5>        48 89 fb                          mov    %rdi,%rbx
0x5555555644f8  <+    8>        48 83 ec 10                       sub    $0x10,%rsp
0x5555555644fc  <+   12>        89 37                             mov    %esi,(%rdi)   <---- This arise the error.
0x5555555644fe  <+   14>        89 57 04                          mov    %edx,0x4(%rdi)
0x555555564501  <+   17>        c5 fa 7e 4c 24 20                 vmovq  0x20(%rsp),%xmm1
0x555555564507  <+   23>        c4 c3 f1 22 c1 01                 vpinsrq $0x1,%r9,%xmm1,%xmm0
0x55555556450d  <+   29>        89 4f 08                          mov    %ecx,0x8(%rdi)
0x555555564510  <+   32>        48 c7 47 20 00 00 00 00           movq   $0x0,0x20(%rdi)
0x555555564518  <+   40>        c5 f8 11 47 10                    vmovups %xmm0,0x10(%rdi)
0x55555556451d  <+   45>        4d 85 c9                          test   %r9,%r9
0x555555564520  <+   48>        74 4e                             je     0x555555564570 <_ZN9CudaImage8AllocateEiiibPfS0_+128>
0x555555564522  <+   50>        48 83 7c 24 20 00                 cmpq   $0x0,0x20(%rsp)
0x555555564528  <+   56>        75 05                             jne    0x55555556452f <_ZN9CudaImage8AllocateEiiibPfS0_+63>
0x55555556452a  <+   58>        45 84 c0                          test   %r8b,%r8b
0x55555556452d  <+   61>        75 11                             jne    0x555555564540 <_ZN9CudaImage8AllocateEiiibPfS0_+80>
0x55555556452f  <+   63>        48 83 c4 10                       add    $0x10,%rsp
0x555555564533  <+   67>        5b                                pop    %rbx
0x555555564534  <+   68>        c3                                retq

Thanks for the help

richard-elvira avatar Sep 01 '22 10:09 richard-elvira

I found a problem in image values, it appears that my float image have values between 0 and 1, but CudaImage expect a float image with values between 0 and 255, if I change : img.convertTo(img_32f, CV_32FC1, 1/255.0); to img.convertTo(img_32f, CV_32FC1);

It is able to allocate the image, but in the download function a new error arise. The weird thing is, if I put the same code, with the same image in the main function of the launcher it works well, but in the handler that I prepared to do it, it doesn't work.

In the main function prints this results:

img_32f cols: 720; rows: 540; channels: 1
img_32f type: 5 // CV_32F
Frame: min val: 16 // Minimum value on the image
Frame: max val: 254 // Maximun value on the image
mCudaImg width: 720; height: 540; pitch: 768
SIFT extraction time =        0.60 ms 3770
Incl prefiltering & memcpy =  2.38 ms 3770

There are 3770 sift points detected by GPU

But, if the same code is in the class which handle CudaSift it raises a segmentation fault inside of download fuction, this is the output before the error:

img_32f cols: 720; rows: 540; channels: 1
img_32f type: 5
min val: 16
max val: 254
mCudaImg width: 720; height: 540; pitch: 768

And this is the Disassembler:

0x7ffff7e698eb  <+ 1051>        4c 89 f7                          mov    %r14,%rdi
0x7ffff7e698ee  <+ 1054>        e8 3d a6 de ff                    callq  0x7ffff7c53f30 <_ZNSolsEi@plt>
0x7ffff7e698f3  <+ 1059>        ba 09 00 00 00                    mov    $0x9,%edx
0x7ffff7e698f8  <+ 1064>        48 8d 35 ef 09 07 00              lea    0x709ef(%rip),%rsi        # 0x7ffff7eda2ee
0x7ffff7e698ff  <+ 1071>        48 89 c7                          mov    %rax,%rdi
0x7ffff7e69902  <+ 1074>        49 89 c6                          mov    %rax,%r14
0x7ffff7e69905  <+ 1077>        e8 66 ea de ff                    callq  0x7ffff7c58370 <_ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_PKS3_l@plt>
0x7ffff7e6990a  <+ 1082>        8b b5 38 ff ff ff                 mov    -0xc8(%rbp),%esi
0x7ffff7e69910  <+ 1088>        4c 89 f7                          mov    %r14,%rdi
0x7ffff7e69913  <+ 1091>        e8 18 a6 de ff                    callq  0x7ffff7c53f30 <_ZNSolsEi@plt>
0x7ffff7e69918  <+ 1096>        49 89 c6                          mov    %rax,%r14
0x7ffff7e6991b  <+ 1099>        48 8b 00                          mov    (%rax),%rax
0x7ffff7e6991e  <+ 1102>        48 8b 40 e8                       mov    -0x18(%rax),%rax
0x7ffff7e69922  <+ 1106>        4d 8b bc 06 f0 00 00 00           mov    0xf0(%r14,%rax,1),%r15
0x7ffff7e6992a  <+ 1114>        4d 85 ff                          test   %r15,%r15
0x7ffff7e6992d  <+ 1117>        0f 84 b4 02 00 00                 je     0x7ffff7e69be7 <_ZN9ORB_SLAM315CudaSiftHandler15extractCudaSIFTERN2cv3MatE+1815>
0x7ffff7e69933  <+ 1123>        41 80 7f 38 00                    cmpb   $0x0,0x38(%r15)
0x7ffff7e69938  <+ 1128>        0f 84 b2 01 00 00                 je     0x7ffff7e69af0 <_ZN9ORB_SLAM315CudaSiftHandler15extractCudaSIFTERN2cv3MatE+1568>
0x7ffff7e6993e  <+ 1134>        41 0f be 77 43                    movsbl 0x43(%r15),%esi
0x7ffff7e69943  <+ 1139>        4c 89 f7                          mov    %r14,%rdi
0x7ffff7e69946  <+ 1142>        e8 25 b9 de ff                    callq  0x7ffff7c55270 <_ZNSo3putEc@plt>
0x7ffff7e6994b  <+ 1147>        48 89 c7                          mov    %rax,%rdi
0x7ffff7e6994e  <+ 1150>        e8 1d b4 de ff                    callq  0x7ffff7c54d70 <_ZNSo5flushEv@plt>
0x7ffff7e69953  <+ 1155>        4c 89 e7                          mov    %r12,%rdi
0x7ffff7e69956  <+ 1158>        e8 75 02 df ff                    callq  0x7ffff7c59bd0 <_ZN9CudaImage8DownloadEv@plt>
0x7ffff7e6995b  <+ 1163>        c5 fa 10 4b 08                    vmovss 0x8(%rbx),%xmm1    <------- In this instruction it breaks.
0x7ffff7e69960  <+ 1168>        c5 e8 57 d2                       vxorps %xmm2,%xmm2,%xmm2
0x7ffff7e69964  <+ 1172>        8b 53 04                          mov    0x4(%rbx),%edx
0x7ffff7e69967  <+ 1175>        48 8d 7b 18                       lea    0x18(%rbx),%rdi
0x7ffff7e6996b  <+ 1179>        c5 ea 5a 03                       vcvtss2sd (%rbx),%xmm2,%xmm0
0x7ffff7e6996f  <+ 1183>        45 31 c0                          xor    %r8d,%r8d
0x7ffff7e69972  <+ 1186>        31 c9                             xor    %ecx,%ecx
0x7ffff7e69974  <+ 1188>        4c 89 e6                          mov    %r12,%rsi
0x7ffff7e69977  <+ 1191>        c5 ea 2a 53 10                    vcvtsi2ssl 0x10(%rbx),%xmm2,%xmm2
0x7ffff7e6997c  <+ 1196>        e8 0f e9 de ff                    callq  0x7ffff7c58290 <_Z11ExtractSiftR8SiftDataR9CudaImageidffbPf@plt>

richard-elvira avatar Sep 02 '22 10:09 richard-elvira