pyvips
pyvips copied to clipboard
High RAM usage with fetch when reading multiple WSI tiles
Hello, I'm working on a Torch dataset that directly reads tiles using pyvips from multiple WSI histopathological images in any given sequence. This means the first tile could be from slide 1 and the next one from slide 2, and so on. To access the tiles within the slide, I'm using 'fetch' from pyvips.Region instead of 'crop' on a pyvips.Image, which is faster. However, I'm experiencing an increase in RAM usage when using 'fetch', an issue I don't encounter with 'crop'.
Here is a simple code that reproduces my problem (with pyvips 2.2.1 and libvips 8.15):
from pathlib import Path
import pyvips
# dataframe is a pandas Dataframe with columns:
# - slide_name: slide_id to map with corresponding pyvips Image in
# images_dict
# - x: x coordinate of the tile in the puvips.Image
# - y: y coordinate of the tile int the pyvips.Image
# The tiles were all selected in the tissue and are ordered by slides and
# horizontally from top left to bottom right
TILE_SIZE = 512
images_dict = {}
for slide_path in he_paths:
image = pyvips.Image.new_from_file(slide_path, subifd=-1, access="sequential")
slide_name = Path(slide_path).stem
images_dict[slide_name] = image
for idx, row in tqdm(dataframe.iterrows(), total=len(dataframe)):
slide_name = row["slide_name"]
image = images_dict[slide_name]
region = pyvips.Region.new(image)
x, y = row["x"], row["y"]
buffer = region.fetch(x, y, TILE_SIZE, TILE_SIZE)
del region, buffer
Here is my RAM usage when using fetch:
The RAM increase seems to happen when I start loading tiles from a new slide (easy to detect because in my example the tiles are ordered by slides). However, when I use the crop function, this problem doesn't occur anymore.
for idx, row in tqdm(dataframe.iterrows(), total=len(dataframe)):
slide_name = row["in_slide_name"]
image = images_dict[slide_name]
x, y = row["x"], row["y"]
tile = image.crop(x, y, TILE_SIZE, TILE_SIZE).numpy()
del tile
Currently I can’t use my torch dataset with numerous slides or multiworkers. Could you help me understand what might be causing this issue? Also, is there a way to use fetch without causing an increase in RAM usage?
Thanks!