MDT icon indicating copy to clipboard operation
MDT copied to clipboard

Stuck at "We will use the optimizer Powell with default settings."

Open fkm24 opened this issue 1 year ago • 2 comments

Hi Robbert,

We are running into an issue with the MDT pipeline on our HPC GPU nodes. The problem we are experiencing occurs at a specific stage of the pipeline while fitting the NODDI model: We will use the optimizer Powell with default settings.” At this point, the pipeline seems to become unresponsive, and no further progress or output is generated in the log files.

I have been watching node while it was working for about 4 hours until my session expired and could see one or two cpu cores being used constantly but no output generated. Similarly mdt finds 4 gpu cards but only uses one while the code is running and changing between them doesn’t make a difference. I have also tried different cuda and nibabel to no avail.

If you could spare some time to advise us on how to proceed or if you require additional information to better understand the issue, we would greatly appreciate your assistance.

Many thanks! Eli

fkm24 avatar Nov 23 '23 18:11 fkm24

Hi Eli,

This is unfortunately a very common problem, but very hard to solve. The problem is that it hangs at kernel compilation. To run code on the GPU, MDT will offer a kernel to the compiler to compile the code the GPU will execute. These kernels are relatively large and complex and sometimes compilers trip over these kernels. I had similar issues where with the same GPU, an older driver worked fine, a newer did not, and an even newer worked fine again. It seems these compiler bugs come and go.

What you could try is to install the latest version of "POCL" as a CPU runtime. It may not give you GPU power just yet, but at least you can try experimenting with MDT. After that, you could try upgrading the GPU drivers to see if that works. It is unfortunately the best I can do, OpenCL driver support is somewhat volatile.

Best, Robbert

robbert-harms avatar Nov 23 '23 20:11 robbert-harms

Hi Robbert,

Many thanks for the reply.

Unfortunately I could not get POCL to work with MDT either. However, I tried the singularity recipe here and the NODDI model ran successfully to completion on the HPC.

But alas, the ODI values were fixed at 0.5 throughout the entire brain mask so I am not sure what's happening. At this juncture, I find myself at a loss on how to proceed.

image

Another consideration is acquiring a dedicated workstation for MDT in my lab. However, I'm uncertain about the process, given the apparent complexity involving various factors such as OpenCL drivers. If anyone has successfully implemented MDT on a GPU, kindly share your workstation configuration.

I also explored AMICO, but the correlation between its regional ODI and MDT ODI (before it ceased functioning on our HPC) exhibited weak to moderate consistency within individual subjects.

AMICO offers an option to save modulated ODI maps (scaled by the tissue fraction, i.e., inverted free water maps), revealing a closer agreement with original MDT values.

Best Wishes, Eli

fkm24 avatar Dec 04 '23 11:12 fkm24