pyvips icon indicating copy to clipboard operation
pyvips copied to clipboard

High RAM usage with fetch when reading multiple WSI tiles

Open GuillaumeBalezo opened this issue 1 year ago • 2 comments

Hello, I'm working on a Torch dataset that directly reads tiles using pyvips from multiple WSI histopathological images in any given sequence. This means the first tile could be from slide 1 and the next one from slide 2, and so on. To access the tiles within the slide, I'm using 'fetch' from pyvips.Region instead of 'crop' on a pyvips.Image, which is faster. However, I'm experiencing an increase in RAM usage when using 'fetch', an issue I don't encounter with 'crop'.

Here is a simple code that reproduces my problem (with pyvips 2.2.1 and libvips 8.15):

from pathlib import Path
import pyvips

# dataframe is a pandas Dataframe with columns:
# - slide_name: slide_id to map with corresponding pyvips Image in
#     images_dict
# - x: x coordinate of the tile in the puvips.Image
# - y: y coordinate of the tile int the pyvips.Image
# The tiles were all selected in the tissue and are ordered by slides and
# horizontally from top left to bottom right

TILE_SIZE = 512
images_dict = {}
for slide_path in he_paths:
    image = pyvips.Image.new_from_file(slide_path, subifd=-1, access="sequential")
		slide_name = Path(slide_path).stem
    images_dict[slide_name] = image

for idx, row in tqdm(dataframe.iterrows(), total=len(dataframe)):
    slide_name = row["slide_name"]
    image = images_dict[slide_name]
    region = pyvips.Region.new(image)
    x, y = row["x"], row["y"]
    buffer = region.fetch(x, y, TILE_SIZE, TILE_SIZE)
    del region, buffer

Here is my RAM usage when using fetch:

ram

The RAM increase seems to happen when I start loading tiles from a new slide (easy to detect because in my example the tiles are ordered by slides). However, when I use the crop function, this problem doesn't occur anymore.

for idx, row in tqdm(dataframe.iterrows(), total=len(dataframe)):
    slide_name = row["in_slide_name"]
    image = images_dict[slide_name]
    x, y = row["x"], row["y"]
    tile = image.crop(x, y, TILE_SIZE, TILE_SIZE).numpy()
    del tile

Currently I can’t use my torch dataset with numerous slides or multiworkers. Could you help me understand what might be causing this issue? Also, is there a way to use fetch without causing an increase in RAM usage?

Thanks!

GuillaumeBalezo avatar Jan 18 '24 11:01 GuillaumeBalezo