pyclesperanto_prototype
pyclesperanto_prototype copied to clipboard
median_sphere filter fails on large images
Hi all,
@spherelife and I are having issues running the pyclesperanto_prototype.median_sphere filter. the same kernel size (6,6,2) works on a small image but not on a larger image. Error: RuntimeError: clEnqueueReadBuffer failed: OUT_OF_RESOURCES
We can run a smaller kernel on the large image (2,2,2) for example, but not the 6,6,2 kernel. Surprisingly, after failing in the large image, it also stops running on the small image either afterwards (same error) until we restart the kernel.
I was not expecting that the image size would play a large role in the processing but maybe I'm wrong and misunderstood something?
We are using a VM with an W10 machine and a shared NVIDIA RTXA6000-12Q.
I attach
- the datasets example https://drive.switch.ch/index.php/s/4Vmha5Ioy5OQtHY
- the log, script to recreate the error, environment recipe as yml and environment manual setup command lines. https://drive.switch.ch/index.php/s/7yBtjuxoOqg1rzZ
Let me know if something else could be of use for you.
Thanks!
Hi @sebherbert ,
how large is the large image?
Best, Robert
Hi @haesleinhuepf
The small image we tested is 378x363x77 in xyz and ~82 MB. The large image is 1536x1536x134 and ~2.4 GB.
Best, Fei
Hi Robert,
Thanks for the fast follow-up!
As @spherelife was saying the "large" image 1536x1536x134 (16bits if I recall correctly) so nothing completely crazy :) I guess we could try with intermediate image size if it makes sense?
Best, Sebastien
Hi @sebherbert and @spherelife ,
if you work on a Windows machine, can you try the solution proposed here and extend the kernel timeout in the registry?
Let me know if this helps!
Best, Robert
Hi @haesleinhuepf,
Thanks for the reply and sorry it took us a while to come back to you, In the meantime @spherelife have tested the same image on a more powerful workstation (A100 card, Linux based). He reported that it ran smoothly (and he even tested with a 6x6x6 kernel that also passed without complaining despite being larger than 1000 voxels) so we'll keep this solution for the moment. So I guess this was either an allocation speed issue or limited vRAM issue since the larger card is working?
Thanks again for the support,
Best, Sebastien
I just wanted to comment in cased anyone else comes across this issue and is looking for assistance besides 'better GPU'. In fact, I came across this issue because processing some images was working on an RTX 3060 12GB, but not on a Quadro RTX6000 (24GB) workstation GPU. The workstation GPU was having an CL_INVALID_COMMAND_QUEUE error on slightly larger images, but was handling images 1/4 the size ok. Either way, the images are at least 20 times smaller than VRAM and all worked on the 3060.
Anyways, I added to the registry (previously no key existed).
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\GraphicsDrivers]
"TdrDelay"=dword:0000003c
"TdrDdiDelay"=dword:0000003c
in Powershell with
New-ItemProperty -Path "HKLM:\SYSTEM\CurrentControlSet\Control\GraphicsDrivers" -Name TdrDelay -PropertyType DWord -Value 60 -Force
New-ItemProperty -Path "HKLM:\SYSTEM\CurrentControlSet\Control\GraphicsDrivers" -Name TdrDdiDelay -PropertyType DWord -Value 60 -Force
and am now getting no error on the workstation GPU. I have had other workflows in the past with much larger images that also have CL_MEM_OBJECT_ALLOCATION_FAILURE and will report back if it also helps.