pyvips Opening and saving the same WSI with the same size

Hello, I'm trying to figure out how to open an SVS file, downsizing it, and save it as .tif without having the resulting file being larger than the original one.

I don't want any degradation on the quality of the WSI during the operation and at best I would like to keep the metadata generated by the SVS format into the new .tif format. For example, if I do:

# file is 187.5mb on disk
img = pyvips.Image.openslideload(str(file), level=0)
img = img.resize(0.92) # Downsizing
img.tiffsave(str(file_output), compression='lzw')
# file_output is 828.2mb on disk

Why do I get a jump in size like this? I'd like to use a lossless compression algorithm when I save the new file_output file. Also is the .tif format the best format which can be read by Openslide/libVips when it comes to reading speed/compression ratio?

Thanks a lot!

Mar 31 '20 11:03 godardt

Hello @EKami,

SVS files are usually compressed with jpeg2000, so you'll need to use a lossy compressor.

I would try:

img = pyvips.Image.openslideload(str(file))
img = img.resize(0.92) 
img.tiffsave(str(file_output), compression='jpeg', Q=85, tile=True, properties=True)

The properties argument makes tiffsave write all the metadata to the IMAGEDESCRIPTION tag as XML. It'll need to be a tiled tiff or you'll hit the 64k pixel JPEG limit.

I see:

$ vips copy CMU-1.svs x.tif[compression=jpeg,Q=85,tile,properties]
$ ls -l
total 294072
-rw-r--r-- 1 john john 177552579 Feb 10 20:30 CMU-1.svs
-rw-r--r-- 1 john john 116692123 Mar 31 14:06 x.tif

So reasonably close.

Mar 31 '20 13:03 jcupitt

Thank you so much @jcupitt !! I have another question: If I use compression='jpeg', Q=85 wouldn't I loose on image quality on top of the jpeg2000 compression the SVS files have already applied during scanning?

The reason why I really want to go with lossless compression is that my ultimate goal is to be able to convert both Mirax and SVS files under the same format .tif while:

Keeping the metadatas
Downsizing the WSIs
Not loosing on image quality (since those WSIs have to run through a deep learning algorithm and I noticed that it's very sensitive to the changes in image quality, even with compression='jpeg', Q=100).

Thanks a lot!

Mar 31 '20 13:03 godardt

Yes, you'll get extra artefacts from the jpg compression.

I do deep learning directly on the WSI image, would that be an option? You can pull rects from SVS files and pass them to pytorch etc. You don't need to go via a tiff intermediate.

Mar 31 '20 14:03 jcupitt

Sample code and benchmark: https://github.com/libvips/pyvips/issues/100#issuecomment-493960943

Mar 31 '20 14:03 jcupitt

I think that'll probably be the only option for SVS files at this point since they seem to be compressed by a lot with the jpeg2000 format. As for Mirax, I found that I have room to shrink their size since I only need them at downsampling 2.0/level 1 which is 4 times less than the original size.

Thanks a lot for your help @jcupitt , very appreciated :)

Mar 31 '20 14:03 godardt