ASAP
ASAP copied to clipboard
[multiresolutionimageinterface] Speed up patch loading
Hi, I am attempting to write a as-fast-as-possible (tensorflow/python) dataloader for WSI patches. I looked in the issues for keywords like "fast", "speed", "accelerate", but did not find any best practices.
This is what i have tried for CAMELYON 16 dataset. Maybe the maintainers/community can provide some insights?
# Import ASAP lib first!
import sys
sys.path.append('C:\\Program Files\\ASAP 2.1\\bin')
import multiresolutionimageinterface as mir
reader = mir.MultiResolutionImageReader()
# Step 1 - Loop over random anchor points "pre-selected" from whole-slides-images
# res = {patient_key1: KEY_POINTS: [[x1,y1], [x2,y2], ....]}
patch_width = ...
patch_height = ...
patient_level = ...
for patient_key in res:
path_img = ...
path_mask = ...
wsi_img = reader.open(str(path_img))
wsi_mask = reader.open(str(path_mask))
ds_factor = wsi_mask.getLevelDownsample(patient_level)
# Step 2 - Loop over points for a particular patient
for point in res[patient_key][KEY_POINTS]:
wsi_patch_mask = np.array(wsi_mask.getUCharPatch(point[0]) * ds_factor, point[1] * ds_factor, patch_width, patch_height, patient_level))
wsi_patch_img = np.array(wsi_img.getUCharPatch( point[0]) * ds_factor, point[1] * ds_factor, patch_width, patch_height, patient_level))
yield(wsi_patch_img, wsi_patch_mask)
Full code can be found here
My concern is that since I am loading so many patches from the same patient (with some randomization). And then once a fixed set of patches N is loaded from a patient, I move on to the next patient. Is it not possible to speed the patch loading for a patient? Or should I load the whole image at once, but that may lead to memory overflow?