BraTS-Toolkit icon indicating copy to clipboard operation
BraTS-Toolkit copied to clipboard

Segmentation does not run on RTX 3080.

Open vincenzoml opened this issue 3 years ago • 13 comments

BraTS-Toolkit does not yet work with RTX 3080 GPUs. Is there a fix already, or is fixing this planned?

Details: when running e.g. isen-20 using nvidia-docker2, via the BraTS-Toolkit, in the log I see:

WARNING: Detected NVIDIA GeForce RTX 3080 GPU, which is not yet supported in this version of the container ERROR: No supported GPU(s) detected to run this container

Indeed, a container with CUDA 11 is required. I do not have the necessary expertise to adapt the container to CUDA 11 on my own, but could test a CUDA 11 image and verify that the GPU is correctly detected.

vincenzoml avatar Feb 26 '21 10:02 vincenzoml

Thanks for reporting this. I successfully tested the 2020 dockers on a CUDA11.0 system with Quadro RTX6000 and 8000s. So CUDA11 on your host system is not the issue but the quite recent RTX3080.

Image specific issues like this one can only be fixed by the authors of the images. You could try contacting Fabian Isensee asking for an update. If Fabian decides to update the image I could then push the new image to the btk repo on docker hub.

neuronflow avatar Feb 26 '21 10:02 neuronflow

Hi, thanks for the prompt reply. You can keep the subject as you prefer, but as a matter of fact, I've not been able to run ANY of the segmentations on the geforce RTX 3080. I believe all of the images should be "rebased" to the new GPUs, which is due to the very nature of docker images.

On the one hand, I understand this issue requires cooperation from all the image authors, on the other hand, being quite new to the use of GPU in docker images, I think some countermeasures should be taken by BraTS-Toolkit since otherwise, the tool becomes unusable on newer GPUs.

vincenzoml avatar Feb 26 '21 11:02 vincenzoml

(Reading helps, my bad! I didn't see you posted the error message. Apologies.)

The software simply does not support Ampere. That is probably unfixable without recreating each container using a newer version of whatever they use (pytorch in my case).

FabianIsensee avatar Feb 26 '21 11:02 FabianIsensee

exactly, I don't see a way BTK could mitigate these issues? NVIDIA could address this by producing GPUs and drivers with more downwards/upwards compatibility and extended support durations.

neuronflow avatar Feb 26 '21 11:02 neuronflow

Hi, to integrate the previous comments, here is some error coming from other images RTX 3080 GPU:

hnfnetv1-20, mic-dkfz, : RuntimeError: cuDNN error: CUDNN_STATUS_MAPPING_ERROR sanet0-20, yixinmpl-20: RuntimeError: Unable to find a valid cuDNN algorithm to run convolution scan: Found GPU0 GeForce RTX 3080 which requires CUDA_VERSION >= 9000 for optimal performance and fast startup time, but your PyTorch was compiled with CUDA_VERSION 8000. Please install the correct PyTorch binary using instructions from http://pytorch.org

For scan algorithms the error is a warning but final result is not correct.

gina-belmonte avatar Feb 26 '21 13:02 gina-belmonte

@neuronflow @FabianIsensee: my aim is to be able to reproduce some methods from the Brats leaderboard, in order to compare punctual (per case) results to what we obtain with a simple, non-machine-learning based specification (in case you are curious: see www.voxlogica.com, and the associated papers).

Then: I mentioned CUDA11 above; from my limited understanding of the matter, that's not a requirement for the host system but for the guest. Unfortunately this is how nvidia develops their docker system (it would be nice from them to provide a compatibility layer in nvidia-docker2) but on the other hand that's what we got.

My take on "mitigation" is that for instance, if the docker images were accompained by the steps to rebuild them, or even better, a dockerfile, then users could just try to "rebase" all the commands on one of the ubuntu images provided by nvidia (see below) and provide patched machines themselves.

See e.g. https://hub.docker.com/r/nvidia/cuda , there we have a list of images which probably cover all the base images of all the Brats segmentation algorithms.

So @FabianIsensee if you have a dockerfile can you share it? I will try to change it. Or otherwise, are there instructions elsewhere for your algorithms?

(and thanks in advance for any hint)

vincenzoml avatar Feb 26 '21 14:02 vincenzoml

Hi @vincenzoml , the problem is not the dockerfile. The problem is that in order to recreate the docker with a newer NGC base you need to have access to everything. The code, the weights and the dockerfile. As far as I know, most participants do not make any of this stuff public (we are one of the only teams to consistently do so). I can recreate our docker with a newer NGC base, but others will not want to put in the effort. Even for me that is rather inconvenient because I need to go back and see if I can assemble everything back in one place. Then I would have to retest everything in order to make sure it produces the same results as before. All of that is probably about 2-3h of work and I really do not have time at the moment. I would like to be able to rely on the Docker to just work, even with newer generations of GPUs - that is unfortunately beyond my (and of course BTK's) control. The easiest way for you to run everything would be to get hold of a Turing or Volta GPU :-D Even if I go ahead and build a new docker it will be difficult to convince all the others to do so as well

FabianIsensee avatar Feb 26 '21 14:02 FabianIsensee

Dear all, I was not fully aware of the situation. So to make this very clear: at the moment, due to the GPU I have, I do not have any hope to replicate most of the results from previous challenges even if a docker image is provided? If so I can try to bug NVIDIA about these issues; docker is meant for safe replication and it's doing the exact opposite job?

Also (off-topic for the issue, but answering here could help others in my situation): are the segmentation results from all methods of the challenge available somewhere? And what about the ground truth of validation?

@FabianIsensee indeed it is not necessary that you rebuild your image, since your algorithm also runs in CPU, which is sufficient for our purposes.

vincenzoml avatar Feb 26 '21 15:02 vincenzoml

I believe this is indeed something you could to try and bug nvidia with :-D AFAIK you cannot change the base image of a container after it was created (I might be wrong about this though) so there is not really a way to make older dockers work with newer GPUs. Best, Fabian

FabianIsensee avatar Feb 26 '21 15:02 FabianIsensee

"docker is meant for safe replication and it's doing the exact opposite job?" no, it does the job, but only within defined limits.

BraTS submissions and GT for validation and test set are not publicly available, you can contact individual authors regarding their submissions, but BraTS organizers cannot provide it to you unless you write a well defined research proposal. If you want to go this route I would recommend to contact Bjoern and Spyros.

@vincenzoml how big is the dataset you want to segment? I might be able to do it for you. We can schedule a call and discuss your project after the Miccai deadline.

neuronflow avatar Feb 26 '21 15:02 neuronflow

I believe this is indeed something you could to try and bug nvidia with :-D AFAIK you cannot change the base image of a container after it was created (I might be wrong about this though) so there is not really a way to make older dockers work with newer GPUs. Best, Fabian

true, change of base image requires rebuilding the image from scratch due to the layered structure of docker images. Thanks for participating in the discussion btw :)

neuronflow avatar Feb 26 '21 16:02 neuronflow

@neuronflow thanks for the hints. We can certainly consider a well-defined research proposal; first we need to explore a bit our ideas on some data, though, and for that it would especially be important to get some ground truth on data that has not been used for training e.g. of ISEN-20 or other methods of Brats. Do you have advice on this?

We will consider re-training on smaller subsets of the testing set where possible.

Thanks for your offer to segment the data yourself. We will take this into account but for the moment I believe we can use the algorithms that we manage to operate (including isen-20 in CPU). Our research is focused on the BraTS dataset for the moment, we do not have additional datasets.

W.r.t. the "job" of docker, I second your point (the same would happen with different GPU makes) but I still think NVidia could try to emulate the interface of older GPUs in their docker images so that the derived images last "longer" on the market. I will check if there's a place to discuss such issues with nvidia even though I understand it is very difficult.

vincenzoml avatar Feb 26 '21 17:02 vincenzoml

Yes, we do have a pending paper with an internal dataset from TUM, GT annotations and BraTS algorithmic segmentations.

neuronflow avatar Feb 26 '21 19:02 neuronflow