bioio icon indicating copy to clipboard operation
bioio copied to clipboard

Performance of bioio compared to native readers

Open frauzufall opened this issue 6 months ago • 4 comments

Issue

First of all, thank you so much for your work on bioio!

We are experiencing significant performance differences in image reading between bioio and the respective readers wrapped by bioio and we would like to know if there is a potential to improve this.

This issue is related to #130.

Background

We want to use bioio for pixel-patrol, a tool for assessing image quality and consistency within and between different image collections. Bioio seems to be the perfect fit for unifying metadata readouts across formats. Therefore, we load each image of a folder using bioio and write statistics and metadata for each file into one big table.

How to reproduce

Code for benchmarking and profiling https://gist.github.com/frauzufall/a4c5b82cafc1c9707c2c8ffd07dd1107 or run it via uv directly:

uv run https://gist.githubusercontent.com/frauzufall/a4c5b82cafc1c9707c2c8ffd07dd1107/raw/a45513e210111624174b251477c69c0ae8830ea8/benchmark_bioio_vs_native.py

Here are some statistics:

PNG:

======================================================================
               PNG Loading Speed Comparison Report              
======================================================================
Number of runs per file: 50

File Name                 | Size (MB)  | imageio (s)     | bioio (s)       | % Higher  
--------------------------------------------------------------------------------
test_image_1000x1000.png  | 0.01       | 0.005756        | 0.031467        | 446.67   %
test_image_100x100.png    | 0.00       | 0.000239        | 0.006888        | 2778.68  %
test_image_2000x2000.png  | 0.02       | 0.037071        | 0.128597        | 246.90   %
test_image_4000x4000.png  | 0.07       | 0.152327        | 0.553192        | 263.16   %
test_image_500x500.png    | 0.00       | 0.001510        | 0.012852        | 751.21   %
test_image_8000x8000.png  | 0.25       | 0.644278        | 2.054795        | 218.93   %

======================================================================
Overall Summary:
----------------------------------------------------------------------
Total average loading time across all PNG images (Imageio): 0.841181 s
Total average loading time across all PNG images (BioIO):    2.787791 s

Conclusion: BioIO (PNG) is slower than Imageio (PNG) by approximately 231.41% (Total difference: 1.946610 s).

TIFF:

======================================================================
               TIFF Loading Speed Comparison Report              
======================================================================
Number of runs per file: 50

File Name                 | Size (MB)  | tifffile (s)    | bioio (s)       | % Higher  
--------------------------------------------------------------------------------
test_image_1000x1000.tiff | 0.95       | 0.000207        | 0.003340        | 1515.81  %
test_image_100x100.tiff   | 0.01       | 0.000150        | 0.002234        | 1386.47  %
test_image_2000x2000.tiff | 3.81       | 0.000362        | 0.007007        | 1833.80  %
test_image_4000x4000.tiff | 15.26      | 0.001776        | 0.028777        | 1520.20  %
test_image_500x500.tiff   | 0.24       | 0.000153        | 0.003179        | 1980.20  %
test_image_8000x8000.tiff | 61.04      | 0.014028        | 0.104423        | 644.39   %

======================================================================
Overall Summary:
----------------------------------------------------------------------
Total average loading time across all TIFF images (Tifffile): 0.016676 s
Total average loading time across all TIFF images (BioIO):    0.148960 s

Conclusion: BioIO (TIFF) is slower than Tifffile (TIFF) by approximately 793.24% (Total difference: 0.132283 s).


And some screenshots from profiling (first bioio, then the native reader).

PNG (bioio with imageio): Image

PNG (imageio): Image

TIFF (bioio with tifffile): Image

TIFF (tifffile) Image

Bonus screenshot from TIFF using bioio, but with many small files (the plugin discovery comes up here more significantly): Image

Wild guesses

Without having any knowledge about the bioio implementation, it looks like the same read method is called twice? And for TIFF, tokenizing seems expensive and is also called repeatedly, is it required?

These seem to be issues related to delayed vs direct array loading. ChatGPT is of the opinion that this line is problematic, but I don't feel competent enough to judge its significance or implications.

Also, the plugin discovery mechanism is quite costly if one uses bioio in a loop on many images, can one cache this somehow?

We wonder if there are ways to improve the performance of bioio and, if needed, are also happy to contribute to efforts in that direction.

Best, Deborah

frauzufall avatar Jul 04 '25 15:07 frauzufall

Thank you so much for this detailed write-up. I think we have been aware that tiff reading is not the most performant, and would love to try to optimize that. We would absolutely welcome any efforts toward optimizing the tiff readers that use tifffile. Will take a deeper look after the weekend.

toloudis avatar Jul 05 '25 15:07 toloudis

I tried using the dask_data loader instead of data like this:

        "bioio_load_func": lambda fp: bioio.BioImage(fp, reader=Reader).get_image_dask_data().compute(),

and got better results. (Still not awesomely fast)

With the code path you have running, we seem to be doing a dask load in order to get dims, and THEN doing a non-dask load. (The code seemed to be going through xarray_dask_data for dims, and then xarray_data. I believe the idea here was that a dask load would be delayed and do much less work but still be a generic way to get the dimensions across many file types...)

toloudis avatar Jul 06 '25 01:07 toloudis

Put together these charts to get a sense of how bioio performance scales with image size.

Image Image Image Image

pgarrison avatar Jul 07 '25 19:07 pgarrison

You will get dramatically better performance specifying the reader (eg, io_image = BioImage(path, reader=reader) vs io_image = BioImage(path). If you know what modules you have installed and file types you are dealing with, a little bit of wrapper code to guess and specify the reader before calling BioImage really pays off.
That being said, bioio should totally be smarter about figuring it out faster.

kmitcham avatar Sep 17 '25 17:09 kmitcham