Caching of volumes to GPU during training of deepedit (radiology app)
Currently, it is not possible to cache data to GPU during training of deepedit, to accelerate training as described in the Fast Training Tutorial from MONAI-Core.
This idea was already mentioned in PR #485, which already implemented other acceleration techniques (e.g. DiceCE Loss, Novograd optimizer, ThreadDataLoader).
I tried putting the two transforms ToTensord() and ToDeviced() before the first Randomized transform, but that throws an error that a torch tensor cannot be cast to a numpy tensor (the error is thrown in the transform AddInitialSeedPointMissingLabelsd()).
Looking into the deepedit training transforms, the reason for the above error is probably the computation of a chamfer distance function using scipy's distance_transform_cdt. I saw in the MONAI-Core discussion #1332 that @tvercaut notified us about their recent work FastGeodis, which allows for fast computation of Euclidean/Geodesic distance functions based on cuda (torch-compatible!).
It would be great to revisit the idea of PR #485 and offer caching of images to GPU during training. My simple attempt above is not sufficient: apart from having to make AddInitialSeedPointMissingLabelsd() torch-based, the Multi-GPU scenario requires distributed caching across GPUs - I am not sure where in the MONAI-Label code this would go.
You already have the option to cache the dataset.. And by default it is CachedDataset.. and it uses ThreadDataLoader https://github.com/Project-MONAI/MONAILabel/blob/main/monailabel/tasks/train/basic_train.py#L149-L150
Optimizers, Transforms etc.. can be defined in your train task definition.. https://github.com/Project-MONAI/MONAILabel/blob/main/sample-apps/radiology/lib/trainers/deepedit.py#L76-L80
Please connect with @diazandr3s (Andres) our core developer for DeepEdit to further optimize the performance of corresponding transforms.. I think with MetaTensor support things can be done more clean and if possible to keep all the image data on GPUs while running many of pre-transforms.. that should give a boost wrt latency..
and yes.. if we can have run distance_transform_cdt at faster speed, that shall help.. specially to run the simulation clicks while training.. and currently that's the main time-consuming operation (run N times for every batch while training)
This will be solved as part of the interactivity restructuring: https://github.com/Project-MONAI/MONAILabel/issues/1173