cytomapper icon indicating copy to clipboard operation
cytomapper copied to clipboard

BPPARAM + on disk handling on images

Open andrea-de-micheli opened this issue 1 year ago • 3 comments

Hello,

I've noticed that cytomapper::measureObjects doesn't execute with multiple workers when the images are stored on disk. Only one CPU core seems to be utilized despite running the following line:

sc_all = measureObjects(masks_disk, image = images_disk, img_id = "sample_id", BPPARAM = MulticoreParam(workers = 32))

This doesn't happen when images are referenced in memory -- multiple cores are used.

I have over 900 images and masks on disk as HDF5 files from which I would like to create a single sce object. What is the best course of action for this task?

Thank you!

andrea-de-micheli avatar Jan 18 '24 09:01 andrea-de-micheli

Hi @andrea-de-micheli,

to create a single SCE object from your images/masks, it would be more efficient to use the steinbock framework for pre-processing (described here: https://bodenmillergroup.github.io/steinbock/latest/ and also here: https://www.nature.com/articles/s41596-023-00881-0).

Afterwards, you can run read_steinbock() from the imcRtools package (which you can execute with multiple workers again via BPPARAM and is more performant than measureObjects). For more informatiom, please refer to ?read_steinbock() and the package vignette: https://bioconductor.org/packages/release/bioc/vignettes/imcRtools/inst/doc/imcRtools.html#3_Read_in_IMC_data

Anyhow, thanks for the heads up regarding measureObjects/BPPARAM issues with HDF5 files. I will have a closer look at this as well.

Feel free to close the issue, if this worked better for you.

Best, Lasse

lassedochreden avatar Jan 18 '24 10:01 lassedochreden

Thanks for your feedback Lasse. I'm not using IMC images and have a custom pipeline for segmentation, and that is why I tried to stay away from Steinbock. steinbock measure intensities runs on my data but sadly does not output something, maybe due to differences in file formats and directory structures. Hard to troubleshoot. Any other pointers?

andrea-de-micheli avatar Jan 18 '24 10:01 andrea-de-micheli

Hi,

steinbock should work on non-IMC images as long as the file formats match. You could check: https://bodenmillergroup.github.io/steinbock/latest/cli/preprocessing/#external-images - And if you run into troubles there, potentially open an issue for steinbock.

Regarding measureObjects - One option:

  1. Try to use loadImages with on_disk = FALSE to load images into memory (potentially for different subsets of the data and then merge to avoid potential memory issues) and then run measureObjects in a multicore fashion.

Will try to check the measureObjects/BPPARAM issues with HDF5 soon as well.

Best, Lasse

lassedochreden avatar Jan 18 '24 11:01 lassedochreden