ASAP icon indicating copy to clipboard operation
ASAP copied to clipboard

[multiresolutionimageinterface] Speed up patch loading

Open prerakmody opened this issue 3 years ago • 2 comments

Hi, I am attempting to write a as-fast-as-possible (tensorflow/python) dataloader for WSI patches. I looked in the issues for keywords like "fast", "speed", "accelerate", but did not find any best practices.

This is what i have tried for CAMELYON 16 dataset. Maybe the maintainers/community can provide some insights?

# Import ASAP lib first!
import sys
sys.path.append('C:\\Program Files\\ASAP 2.1\\bin')
import multiresolutionimageinterface as mir
reader = mir.MultiResolutionImageReader()

# Step 1 - Loop over random anchor points "pre-selected" from whole-slides-images

# res = {patient_key1: KEY_POINTS: [[x1,y1], [x2,y2], ....]}
patch_width   = ...
patch_height  = ...
patient_level = ...
 
for patient_key in res:
    
    path_img  = ...
    path_mask = ...
    wsi_img   = reader.open(str(path_img)) 
    wsi_mask  = reader.open(str(path_mask))
    ds_factor = wsi_mask.getLevelDownsample(patient_level)
    
    # Step 2 - Loop over points for a particular patient
    for point in res[patient_key][KEY_POINTS]:
        
        wsi_patch_mask  = np.array(wsi_mask.getUCharPatch(point[0]) * ds_factor, point[1] * ds_factor, patch_width, patch_height, patient_level))
        wsi_patch_img   = np.array(wsi_img.getUCharPatch( point[0]) * ds_factor, point[1] * ds_factor, patch_width, patch_height, patient_level))

        yield(wsi_patch_img, wsi_patch_mask)

Full code can be found here

My concern is that since I am loading so many patches from the same patient (with some randomization). And then once a fixed set of patches N is loaded from a patient, I move on to the next patient. Is it not possible to speed the patch loading for a patient? Or should I load the whole image at once, but that may lead to memory overflow?

prerakmody avatar Nov 24 '22 13:11 prerakmody

There are a couple of things you can try:

  1. Use multiprocessing and get patches from several images at once.
  2. Sample all patches once and write them to disk in a fast format for your DL library of choice (e.g. TFRecords for TensorFlow)
  3. Try to prevent reading across tile boundaries, the underlying TIFF files are tiled. If you request a region that is the same size as the tilesize, but starts at the center point of the tile, you will need to read 4 tiles to construct the requested tile. This is not always possible and depends on your use-case of course.

Op do 24 nov. 2022 om 14:22 schreef pmod @.***>:

Hi, I am attempting to write a as-fast-as-possible (tensorflow/python) dataloader for WSI patches. I looked in the issues for keywords like "fast", "speed", "accelerate", but did not find any best practices.

This is what i have tried for CAMELYON 16 dataset. Maybe the maintainers/community can provide some insights?

Import ASAP lib first!import syssys.path.append('C:\Program Files\ASAP 2.1\bin')import multiresolutionimageinterface as mirreader = mir.MultiResolutionImageReader()

Step 1 - Loop over random anchor points "pre-selected" from whole-slides-images

res = {patient_key1: KEY_POINTS: [[x1,y1], [x2,y2], ....]}patch_width = ...patch_height = ...patient_level = ...

for patient_key in res:

path_img  = ...
path_mask = ...
wsi_img   = reader.open(str(path_img))
wsi_mask  = reader.open(str(path_mask))
ds_factor = wsi_mask.getLevelDownsample(patient_level)

# Step 2 - Loop over points for a particular patient
for point in res[patient_key][KEY_POINTS]:

    wsi_patch_mask  = np.array(wsi_mask.getUCharPatch(point[0]) * ds_factor, point[1] * ds_factor, patch_width, patch_height, patient_level))
    wsi_patch_img   = np.array(wsi_img.getUCharPatch( point[0]) * ds_factor, point[1] * ds_factor, patch_width, patch_height, patient_level))

    yield(wsi_patch_img, wsi_patch_mask)

Full code can be found here https://gist.github.com/prerakmody/9237b618c804ca9b99c1fd21e30de496

My concern is that since I am loading so many patches from the same patient (with some randomization). And then once a fixed set of patches N is loaded from a patient, I move on to the next patient. Is it not possible to speed the patch loading for a patient? Or should I load the whole image at once, but that may lead to memory overflow?

— Reply to this email directly, view it on GitHub https://github.com/computationalpathologygroup/ASAP/issues/251, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABJIFUEWJW23V5PCKQMJNE3WJ5TYRANCNFSM6AAAAAASKMV7P4 . You are receiving this because you are subscribed to this thread.Message ID: @.***>

GeertLitjens avatar Nov 24 '22 17:11 GeertLitjens

Thanks for the suggestion! I attempted option 1 as it is feasible for my pipeline. But since .getUCharPatch() is already so fast (less than 0.1 sec for each access (test code)), I did not obtain improvements (or any significant reductions). Note that I used tf.data.Dataset API. Looks like the overhead of multiprocessing adds more time than it saves.

Below is a histogram for 2000 patch accesses using .getUCharPatch() (on different (x,y) coords and WSI's). X-axis=time(s) image

prerakmody avatar Nov 29 '22 16:11 prerakmody