MONAI icon indicating copy to clipboard operation
MONAI copied to clipboard

Improve HoVerNet postprocessing performance

Open bhashemian opened this issue 2 years ago • 12 comments

HoVerNet post-processing is an essential part of the hoverne pipeline but it is taking too much time (way more than inference itself). @JHancox has some experiences to make it run much faster but outside of MONAI. Although I am working with cuCIM team to get the necessary scikit-image functions accelerated on GPU to be able to run the postprocessing on GPU, it would be great if we can take advantage of some accelerations on CPU while GPU is busy running the inference.

CC @Nic-Ma @KumoLiu

bhashemian avatar Feb 22 '23 15:02 bhashemian

HoVerNet post-processing is an essential part of the hoverne pipeline but it is taking too much time (way more than inference itself). @JHancox has some experiences to make it run much faster but outside of MONAI. Although I am working with cuCIM team to get the necessary scikit-image functions accelerated on GPU to be able to run the postprocessing on GPU, it would be great if we can take advantage of some accelerations on CPU while GPU is busy running the inference.

CC @Nic-Ma @KumoLiu

Happy to run through some of what I did. I used 3 main techniques: 1) Using a pool of threads to process batches of tiles concurrently 2) yielding dynamically thresholded tiles to the DataLoader 3) caching of strips of image data from the WSI for more efficient IO during item (2.

JHancox avatar Feb 22 '23 15:02 JHancox

Hi @drbeh, as this became a limiting step for me also, I wanted to ask, why not use dask-image functions and dask.map_blocks here multi-thread on CPUs at least?

OmarAshkar avatar Mar 02 '23 21:03 OmarAshkar

Hi @drbeh, as this became a limiting step for me also, I wanted to ask, why not use dask-image functions and dask.map_blocks here multi-thread on CPUs at least?

Hi @omashkartrx,

Have you tried it with dask to see if dask multi-threading can help here? since it does not circumvent python GIL and can only provide parallelism for non-Python code (including NumPy operations), I don't know how much it can help with scikit image operations. https://docs.dask.org/en/stable/scheduling.html#local-threads

@Nic-Ma, we are not using dask anywhere in MONAI, right?

bhashemian avatar Mar 06 '23 15:03 bhashemian

I didn't use dask before.

Thanks.

Nic-Ma avatar Mar 06 '23 15:03 Nic-Ma

Hi @drbeh and @Nic-Ma. I don't know how exactly it works, but I have used dask before with watershed. I loaded a WSI before and it was done in few seconds. Without dask, it was not even loading.

For examples: https://examples.dask.org/applications/image-processing.html https://www.kaggle.com/code/kmader/3d-image-analysis-using-dask/notebook

I am not sure what other postprocessing steps needed other than watershed, but I am sure it works with scikit-image watershed at least.

OmarAshkar avatar Mar 06 '23 15:03 OmarAshkar

Hi @omashkartrx, thanks for sharing your experience. dask should indeed help loading WSI as it is an IO bound operation and we'd appreciate if you feel you can help us here. However, adding dask to monai dependencies should be discussed first based on the value it can bring and also exploring options in rapids might be helpful.

bhashemian avatar Mar 06 '23 16:03 bhashemian

thanks @drbeh. I will definitely try to help, but I am not sure where to start. I will try to implement it and start a pull request to get your opinion. Thanks

OmarAshkar avatar Mar 06 '23 16:03 OmarAshkar

Hi @drbeh and @Nic-Ma. I don't know how exactly it works, but I have used dask before with watershed. I loaded a WSI before and it was done in few seconds. Without dask, it was not even loading.

For examples: https://examples.dask.org/applications/image-processing.html https://www.kaggle.com/code/kmader/3d-image-analysis-using-dask/notebook

I am not sure what other postprocessing steps needed other than watershed, but I am sure it works with scikit-image watershed at least.

@drbeh, @Nic-Ma, @omashkartrx - I have used dask before for exactly this sort of thing and have some notebooks to show this. My GTC workshop session this year (and last year) also shows these approaches in action, but the same can be achieved using threads or processes directly, which reduces the dependencies. Processes have the advantage of not being affected by the GIL, but in practice, I don't see a huge difference between the two.

JHancox avatar Mar 07 '23 10:03 JHancox

Hi @Nic-Ma @KumoLiu, the post-processing performance of hovernet is a bottleneck for using hovernet efficiently as reported by users and our usage in MONAI label. Do you think it is something that you can take a look to find room for improvement?

bhashemian avatar Apr 26 '23 17:04 bhashemian

One approach to try is to replace scikit-image operators with OpenCV, which is what @JHancox has already used. It should give us some speedup since opencv is general much faster than skimage.

bhashemian avatar Apr 26 '23 18:04 bhashemian

Hi @drbeh ,

1 drawback is that OpenCV is too big a package, may cause some dependency error. @wyli I don't remember clearly the reason, you told me to avoid using OpenCV before?

Thanks.

Nic-Ma avatar Apr 26 '23 22:04 Nic-Ma

@wyli I don't remember clearly the reason, you told me to avoid using OpenCV before?

that was about ffmpeg part of opencv for video processing is with some GPL license

wyli avatar Apr 27 '23 14:04 wyli