CudaImage allocate error in my project (segmentation fault)
Hi, i am working with this library, and I am already running the examples without problem.
My main problem is to integrate CudaSift in my project, I have add the CudaSIFT code and run well, but problem comes when I try to allocate an image in the CudaImage container where system launch a segmentation fault inside of allocate function.
I send you the code to see if there is something I am missing:
InitCuda(0); // Initialize with the device 0 (from all the devices with CUDA)
std::cerr << "Empty image in CudaSiftHandler::extractCudaSift" << std::endl;
cv::Mat img_32f;
img.convertTo(img_32f, CV_32FC1, 1/255.0);
unsigned int w = img_32f.cols;
unsigned int h = img_32f.rows;
std::cout << "img_32f cols: " << w << "; rows: " << h << "; channels: " << img_32f.channels() << std::endl;
std::cout << "img_32f type: " << img_32f.type() << std::endl;
mCudaImg.Allocate(w, h, w, false, static_cast<float*>(NULL), (float*);
InitSiftData(mSiftDataImgExt, mnMaxFeatures, true, true);
ExtractSift(mSiftDataImgExt, mCudaImg, mnNumOctaves, mfInitBlur, mfThreshold, mnMinScale, false);
std::cout << "There are " << mSiftDataImgExt.numPts << " sift points detected by GPU" << std::endl;
The output of my code before of the segmentation fault shows that InitCuda has initialized the graphic card correctly, the image is not empty and has the correct format (float):
Device Number: 0 Device name: NVIDIA TITAN Xp Memory Clock Rate (MHz): 5705 Memory Bus Width (bits): 384 Peak Memory Bandwidth (GB/s): 547.7
img_32f cols: 1440; rows: 1080; channels: 1 img_32f type: 5
Disassembler (at the beginning of Allocate code, in line 5):
0x5555555644f0 f3 0f 1e fa endbr64
0x5555555644f4 <+ 4> 53 push %rbx
0x5555555644f5 <+ 5> 48 89 fb mov %rdi,%rbx
0x5555555644f8 <+ 8> 48 83 ec 10 sub $0x10,%rsp
0x5555555644fc <+ 12> 89 37 mov %esi,(%rdi) <---- This arise the error.
0x5555555644fe <+ 14> 89 57 04 mov %edx,0x4(%rdi)
0x555555564501 <+ 17> c5 fa 7e 4c 24 20 vmovq 0x20(%rsp),%xmm1
0x555555564507 <+ 23> c4 c3 f1 22 c1 01 vpinsrq $0x1,%r9,%xmm1,%xmm0
0x55555556450d <+ 29> 89 4f 08 mov %ecx,0x8(%rdi)
0x555555564510 <+ 32> 48 c7 47 20 00 00 00 00 movq $0x0,0x20(%rdi)
0x555555564518 <+ 40> c5 f8 11 47 10 vmovups %xmm0,0x10(%rdi)
0x55555556451d <+ 45> 4d 85 c9 test %r9,%r9
0x555555564520 <+ 48> 74 4e je 0x555555564570 <_ZN9CudaImage8AllocateEiiibPfS0_+128>
0x555555564522 <+ 50> 48 83 7c 24 20 00 cmpq $0x0,0x20(%rsp)
0x555555564528 <+ 56> 75 05 jne 0x55555556452f <_ZN9CudaImage8AllocateEiiibPfS0_+63>
0x55555556452a <+ 58> 45 84 c0 test %r8b,%r8b
0x55555556452d <+ 61> 75 11 jne 0x555555564540 <_ZN9CudaImage8AllocateEiiibPfS0_+80>
0x55555556452f <+ 63> 48 83 c4 10 add $0x10,%rsp
0x555555564533 <+ 67> 5b pop %rbx
0x555555564534 <+ 68> c3 retq
Thanks for the help
I found a problem in image values, it appears that my float image have values between 0 and 1, but CudaImage expect a float image with values between 0 and 255, if I change :
img.convertTo(img_32f, CV_32FC1, 1/255.0);
img.convertTo(img_32f, CV_32FC1);
It is able to allocate the image, but in the download function a new error arise. The weird thing is, if I put the same code, with the same image in the main function of the launcher it works well, but in the handler that I prepared to do it, it doesn't work.
In the main function prints this results:
img_32f cols: 720; rows: 540; channels: 1
img_32f type: 5 // CV_32F
Frame: min val: 16 // Minimum value on the image
Frame: max val: 254 // Maximun value on the image
mCudaImg width: 720; height: 540; pitch: 768
SIFT extraction time = 0.60 ms 3770
Incl prefiltering & memcpy = 2.38 ms 3770
There are 3770 sift points detected by GPU
But, if the same code is in the class which handle CudaSift it raises a segmentation fault inside of download fuction, this is the output before the error:
img_32f cols: 720; rows: 540; channels: 1
img_32f type: 5
min val: 16
max val: 254
mCudaImg width: 720; height: 540; pitch: 768
And this is the Disassembler:
0x7ffff7e698eb <+ 1051> 4c 89 f7 mov %r14,%rdi
0x7ffff7e698ee <+ 1054> e8 3d a6 de ff callq 0x7ffff7c53f30 <_ZNSolsEi@plt>
0x7ffff7e698f3 <+ 1059> ba 09 00 00 00 mov $0x9,%edx
0x7ffff7e698f8 <+ 1064> 48 8d 35 ef 09 07 00 lea 0x709ef(%rip),%rsi # 0x7ffff7eda2ee
0x7ffff7e698ff <+ 1071> 48 89 c7 mov %rax,%rdi
0x7ffff7e69902 <+ 1074> 49 89 c6 mov %rax,%r14
0x7ffff7e69905 <+ 1077> e8 66 ea de ff callq 0x7ffff7c58370 <_ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_PKS3_l@plt>
0x7ffff7e6990a <+ 1082> 8b b5 38 ff ff ff mov -0xc8(%rbp),%esi
0x7ffff7e69910 <+ 1088> 4c 89 f7 mov %r14,%rdi
0x7ffff7e69913 <+ 1091> e8 18 a6 de ff callq 0x7ffff7c53f30 <_ZNSolsEi@plt>
0x7ffff7e69918 <+ 1096> 49 89 c6 mov %rax,%r14
0x7ffff7e6991b <+ 1099> 48 8b 00 mov (%rax),%rax
0x7ffff7e6991e <+ 1102> 48 8b 40 e8 mov -0x18(%rax),%rax
0x7ffff7e69922 <+ 1106> 4d 8b bc 06 f0 00 00 00 mov 0xf0(%r14,%rax,1),%r15
0x7ffff7e6992a <+ 1114> 4d 85 ff test %r15,%r15
0x7ffff7e6992d <+ 1117> 0f 84 b4 02 00 00 je 0x7ffff7e69be7 <_ZN9ORB_SLAM315CudaSiftHandler15extractCudaSIFTERN2cv3MatE+1815>
0x7ffff7e69933 <+ 1123> 41 80 7f 38 00 cmpb $0x0,0x38(%r15)
0x7ffff7e69938 <+ 1128> 0f 84 b2 01 00 00 je 0x7ffff7e69af0 <_ZN9ORB_SLAM315CudaSiftHandler15extractCudaSIFTERN2cv3MatE+1568>
0x7ffff7e6993e <+ 1134> 41 0f be 77 43 movsbl 0x43(%r15),%esi
0x7ffff7e69943 <+ 1139> 4c 89 f7 mov %r14,%rdi
0x7ffff7e69946 <+ 1142> e8 25 b9 de ff callq 0x7ffff7c55270 <_ZNSo3putEc@plt>
0x7ffff7e6994b <+ 1147> 48 89 c7 mov %rax,%rdi
0x7ffff7e6994e <+ 1150> e8 1d b4 de ff callq 0x7ffff7c54d70 <_ZNSo5flushEv@plt>
0x7ffff7e69953 <+ 1155> 4c 89 e7 mov %r12,%rdi
0x7ffff7e69956 <+ 1158> e8 75 02 df ff callq 0x7ffff7c59bd0 <_ZN9CudaImage8DownloadEv@plt>
0x7ffff7e6995b <+ 1163> c5 fa 10 4b 08 vmovss 0x8(%rbx),%xmm1 <------- In this instruction it breaks.
0x7ffff7e69960 <+ 1168> c5 e8 57 d2 vxorps %xmm2,%xmm2,%xmm2
0x7ffff7e69964 <+ 1172> 8b 53 04 mov 0x4(%rbx),%edx
0x7ffff7e69967 <+ 1175> 48 8d 7b 18 lea 0x18(%rbx),%rdi
0x7ffff7e6996b <+ 1179> c5 ea 5a 03 vcvtss2sd (%rbx),%xmm2,%xmm0
0x7ffff7e6996f <+ 1183> 45 31 c0 xor %r8d,%r8d
0x7ffff7e69972 <+ 1186> 31 c9 xor %ecx,%ecx
0x7ffff7e69974 <+ 1188> 4c 89 e6 mov %r12,%rsi
0x7ffff7e69977 <+ 1191> c5 ea 2a 53 10 vcvtsi2ssl 0x10(%rbx),%xmm2,%xmm2
0x7ffff7e6997c <+ 1196> e8 0f e9 de ff callq 0x7ffff7c58290 <_Z11ExtractSiftR8SiftDataR9CudaImageidffbPf@plt>