pyvips icon indicating copy to clipboard operation
pyvips copied to clipboard

Downsampling in Z dimension

Open nooneswarup opened this issue 1 year ago • 10 comments

I'm working on large images which require a lot of memory.

This is a sample image: 850x1900 ushort, 1 band, grey16, tiffload width: 850 height: 1900 bands: 1 interpretation: grey16 n-pages: 791

I want to downsample this image to 0.5x times the original image, which is (425, 950, 395).

using pyvips I am successfully doing the rescale to (425, 950, 791)

I do not want to load the image in memory, as I have images which larger than this size eg: (7175, 8910, 2636)

Is this something I can do using vips?

nooneswarup avatar Jun 27 '24 16:06 nooneswarup

Hi @nooneswarup,

Wow that IS big. What's the exact tiff format? Are they tiled? That would make it a bit easier.

Yes, it shouldn't be too hard. I suppose I'd do a x2 shrink first, then average pairs of pages.

jcupitt avatar Jun 27 '24 17:06 jcupitt

Oh, you have an odd number of pages in your test image. How do you plan to handle that?

jcupitt avatar Jun 27 '24 17:06 jcupitt

For the odd number of pages, I was planning to drop a slice or if its possible to round them up, similar to how Fiji does the rescaling. Is there something you suggest using vips?

I am loading the image using n=-1 and used to crop to split them into different pages. I wonder if there a different approach which I can use for doing the same for Z dimension scaling?

pages = [] for y in range(0, image.height, page_height):   cropped_image = image.crop(0, y, image.width, page_height)   pages.append(cropped_image)

nooneswarup avatar Jun 27 '24 17:06 nooneswarup

I made a test image like this:

$ vips copy nipguide.pdf[dpi=300,n=-1] x.tif[tile,compression=jpeg]
$ vipsheader -a x.tif
x.tif: 2480x3508 uchar, 3 bands, srgb, tiffload
width: 2480
height: 3508
bands: 3
format: uchar
coding: none
interpretation: srgb
xoffset: 0
yoffset: 0
xres: 11.811
yres: 11.811
filename: x.tif
vips-loader: tiffload
n-pages: 58
resolution-unit: in
bits-per-sample: 8
orientation: 1

Then with this test prog:

#!/usr/bin/env python3

import sys
import pyvips

# load and split to an array of images
image = pyvips.Image.new_from_file(sys.argv[1])
pages = image.pagesplit()

# x2 xy shrink
pages = [page.resize(0.5) for page in pages]

# average pairs of pages
pages = [((pages[i] + pages[i + 1]) / 2).cast(pages[i].format)
         for i in range(0, len(pages), 2)]

# join the pages again
image = pyvips.Image.arrayjoin(pages, across=1)

# set the page height so the tiff saver knows how to break the image into
# pages
image.set("page-height", pages[0].height)

image.write_to_file(sys.argv[2])

I can run:

$ /usr/bin/time -f %M:%e ~/try/zshrink.py x.tif[n=-1] x2.tif[tile,compression=jpeg]
317952:46.65
$ vipsheader -a x2.tif
x2.tif: 1240x1754 uchar, 3 bands, srgb, tiffload
width: 1240
height: 1754
bands: 3
format: uchar
coding: none
interpretation: srgb
xoffset: 0
yoffset: 0
xres: 11.811
yres: 11.811
filename: x2.tif
vips-loader: tiffload
n-pages: 29
resolution-unit: in
bits-per-sample: 8
orientation: 1

So it runs in about 40s and needs a peak of 300mb of memory. The whole image is 2480 * 3508 * 58 * 3 bytes, or 1.5gb, so it's streaming it, not just loading the image into memory. You could get that down a bit by reducing the size of the threadpool (this PC has 32 hardware threads).

That's with tiled tiff. If you have a simple strip tiff, it's a little harder, I think you'd need to use sequential mode and a tilecache.

jcupitt avatar Jun 27 '24 17:06 jcupitt

Hi @jcupitt I have used the test image and tried it without the jpeg compression. I have dropped images to make the z dimension even. (My end goal here is to make OME-Zarr files, I know vips still does not have support for that). Thanks!

For the original image I tried to do it by opening it sequential: true. It works upto a certain number of n-pages. I get an error as follow:

(process:39391): GLib-GObject-CRITICAL **: 12:37:06.749: value "10005930" of type 'gint' is invalid or out of range for property 'top' of type 'gint'
Error: unable to call crop
  extract_area: parameter top not set

(crop function has a limit of gint - 10000000) fixed: reading about 1000 slices at once and concat after cropping

nooneswarup avatar Jun 27 '24 21:06 nooneswarup

Ah! I should have noticed. Yes, libvips has a sanity check limiting it to 10m pixels in any axis.

8.16 makes this limit configurable with VIPS_MAX_COORD. You can use eg.

$ VIPS_MAX_COORD=100m ./some-pyvips-prog.py

To run with a higher limit.

jcupitt avatar Jul 03 '24 14:07 jcupitt

Thank you! I was able to figure it out but I have been getting this:

Error: unable to call VipsForeignSaveTiffFile
: out of order read -- at line 8944, but line 144 requested

nooneswarup avatar Jul 03 '24 14:07 nooneswarup

The out of order read happens if you are using sequential for your input files.

The best fix is to make sure they are tiled TIFF.

You can also not use sequential mode, but then you will see very long startup times and very high disc usage.

It might be possible to just expand the tile cache, but that depends on your code and your images files, you'd need to share a complete example.

jcupitt avatar Jul 03 '24 14:07 jcupitt

Unfortunately it is not a tiled TIFF. I do not mind high disk usage or startup times.

I also do not see a 8.16 release for the vips.

nooneswarup avatar Jul 03 '24 16:07 nooneswarup

8.16 is the development version. You can run git master, or wait a few months.

jcupitt avatar Jul 03 '24 16:07 jcupitt