nnUNet icon indicating copy to clipboard operation
nnUNet copied to clipboard

Support for Intel XPU / Intel ARC GPUs

Open Sikerdebaard opened this issue 1 year ago • 4 comments

Hi everyone,

I am excited to announce that I have begun adding Intel XPU support through IPEX into nnUNet, which will allow training and inference on the Intel ARC GPUs. However, I would like to note that the code needs further testing and optimization before merging. Therefore, I am sharing it with the community in hopes that others can contribute to this project.

Currently, the code has only been tested for CPU and Intel XPU. Therefore, there may be bugs that need to be addressed. Furthermore, I have noticed that training on an AMD 7900x CPU is faster than training with the A770 Intel ARC GPU using this code. Additionally, the XPU backend only supports BFloat16 precision at this time.

If you are interested in helping with this project, please feel free to contribute or provide feedback.

Sikerdebaard avatar Mar 14 '23 14:03 Sikerdebaard

Hey thanks for this amazing work! I like how you are abstracting the backends into separate classes. Today we have released nnU-net v2 which already extends the supported devices to cuda, cpu and mps. For the next couple of weeks I will be quite busy with an upcoming evaluation, but after that I would like to discuss how we can use this principle in nnU-Net v2 in order to make integrating new devices less tedious. May I get bet to you on that?

FabianIsensee avatar Mar 17 '23 17:03 FabianIsensee

Hey Thomas, I think fabric is the way to go for this in the future. I will work on adding fabric to nnU-Net soon https://lightning.ai/pages/open-source/fabric/

FabianIsensee avatar Mar 31 '23 15:03 FabianIsensee

Hi Fabian,

If you are looking into frameworks as a solution then ONNX might be worth considering as well. It is backed by Microsoft. It seems that both frameworks, lightning and ONNX, do not support Intel XPU out of the box yet for training, but for inference ONNX can already use XPU through the oneDNN API. Furthermore with ONNX it is possible to convert the model to tensorflow and then to tensorflow.js which could be a useful addition.

Sikerdebaard avatar Mar 31 '23 15:03 Sikerdebaard

Hey, I am quite confident that fabric will support XPUs soon. I have talked to one of their developers recently and they seem highly motivated to include everything that is needed for broad adoption. I like how fabric seamlessly integrates into existing pytorch code which is why I like this solution. It works for both training and inference. If certain formats, like ONNX, are required for running inference in some circumstances, then it would be better to have some onnx export code that takes care of that

FabianIsensee avatar Apr 06 '23 08:04 FabianIsensee