stereo-sgm-opencl icon indicating copy to clipboard operation
stereo-sgm-opencl copied to clipboard

program hangs at clBuildProgram

Open squirelt opened this issue 2 years ago • 12 comments

program hangs on building of the "m_down2up" program. there are no error messages whatsoever, the program simply hangs on this line. the previous programs build successfully.

squirelt avatar Feb 09 '23 22:02 squirelt

Hi, could you please share more details, like what hardware, os, compiler, device driver are you using?

siposcsaba89 avatar Feb 11 '23 07:02 siposcsaba89

I tried with the latest nvidia driver and #6 issue came up, this can be connected to your issue as well, could you please try the latest master, if it fixes for you?

siposcsaba89 avatar Feb 13 '23 06:02 siposcsaba89

On Windows:

  • Compiler: MSVC 19.34.31937.0
  • Hardware: NVIDIA T1000, Driver: NVIDIA 30.0.14.7239

Ubuntu 20.04

  • Compiler: GNU 9.4.0
  • Hardware: NVIDIA TITAN RTX, Driver Version: 515.65.01

Unfortunately, trying the latest master did no fix my problem on either OS. Thank you very much for your help!

squirelt avatar Feb 13 '23 08:02 squirelt

I have tried both on windows and Linux, but unfortunately I couldn't reproduce the issue. Are you using the stereo move example? I updated it to handle better if the input cannot be opened, or the input image has zero size. Could you pull the latest, clean build and test it again? Also is there any message on the command line?

siposcsaba89 avatar Feb 13 '23 17:02 siposcsaba89

Hi, yes I did try it with the new master, but it didn't work. And since the code hangs on one line there is no error message. A friend of mine could make it work, but we don't know already what is wrong in my case. Sine I tracked down the line of code which does not work in debug mode, I also know, that loading images is not the problem.

squirelt avatar Feb 14 '23 06:02 squirelt

Is it possible, that your computer has multiple platform available? For example: intel, nvidia, amd etc. You can select the platform by specifying the --platform_idx=X, where X is the index of the platform on your computer, the default value is 0.

Also, when the application starts, it should print the platform, and device name it is using, in my case for example:

Platform name: NVIDIA CUDA
Device name: NVIDIA GeForce GTX 1070

siposcsaba89 avatar Feb 14 '23 13:02 siposcsaba89

Yes, this works correctly. It selects NVIDIA CUDA as well and then the NVIDIA T1000 graphics card. Afterwards the program hangs.

Is it possible that I need the correct version of CUDA and the correct driver. Also, what built toolchain / compiler do you use?

squirelt avatar Feb 14 '23 15:02 squirelt

I have managed to reproduce the issue, at the office we have a RTX3090, and the same issue is present. I tried to debug, decoupled the compile/link of programs, and it hangs on the link phase. I don't know yet what causes the issue, but I think it could be related to GPU architecture, because older 1080, 1070 Pascal GPUS works fine, but Turing and later not that much

siposcsaba89 avatar Feb 15 '23 12:02 siposcsaba89

Actually I came to a similar conclusion. The only hardware I was able to run the code correctly without the clBuildProgram to hang itself was my very old GTX 760. Same as my colleague, he uses a GTX 1050 and it works also fine.

For the other hardware I found, that disabling optimization for the program build would ressolve matters, but naturally leads to slower execution time. More precisely I use the option "-cl-opt-disable" for clBuildProgram:

    const char* options = {"-cl-opt-disable"};
    err = clBuildProgram(m_cl_program, 1, &device, options, nullptr, nullptr);

in file device_kernel.h.

squirelt avatar Feb 15 '23 14:02 squirelt

To your suggestion about architecture: I could run the program on nvidia hardware with architecture kepler, pascal and ampere but not on turing architecture. That might be the problem eventually.

squirelt avatar Feb 15 '23 16:02 squirelt

It is a very weird issue, it seems like there are some segfaults in some kernels, and that causes the issue, if the input size is rounded to 4, than it works, at least for me, with 4 path aggregation, 8 path still doesn't work, I need to debug it more. Updated the master with the rounding of the size of the input images 8cf8b251dbe2d4ebdc37faa20e7457205d03af63, could you try it?

siposcsaba89 avatar Feb 16 '23 13:02 siposcsaba89

Yes it works, very nice! Thanks a lot for the quick fix.

squirelt avatar Feb 16 '23 16:02 squirelt