pyvips icon indicating copy to clipboard operation
pyvips copied to clipboard

pyvips write_to_file takes more than an hour to write images(10+ GB in size) to disk

Open developer-harishkt opened this issue 6 years ago • 2 comments

Hi, jcupitt & others,

I have been facing an issue, quite similar to this, my task to convert base_tiles i.e jpeg's used for dzi format into multi-tiled pyramidal tiffs.

The Process:

For example, jpeg starts from 0_0.jpeg to 146_216.jpeg, computing of 147 rows & 217 columns.

  • I first stitch all the columns of every row as vertical_0.png to vertical_146.png. I convert these images using PIL & numpy.vstack().
  • Once I have these, I use pyvips to merge all the verticals into a single png image, which is summing up to 11+ GB in size, & it is taking more than an hour to convert.
  • Once that is converted to a single png I convert it into tiff using vips tiffsave command to multi-tiled pyramidal tiff.

It would be great & really really helpful if someone can help me to speed up this process, or provide me an approach in which I can achieve this much faster.

My Code:

To Generate vertical images

    for j in range(start, end):

        imageList = []

        for i in range(numberOfColoums):

            img = "{}_{}.jpeg".format(j, i)
            imageList.append(join(basePath, img))

        list_im = imageList

        imgs = [ Image.open(i) for i in list_im ]

        imgs_comb = np.vstack( (np.asarray(i) for i in imgs ) )
        imgs_comb = Image.fromarray(imgs_comb)
        img_name = 'vertical_{}.jpg'.format(j)
        imgPath = join(tiffPath, img_name)
        imgs_comb.save(imgPath)
        print("{} Written".format(imgPath)) 

To generate a whole image png:

       for i in range(numberOfRows+1):
             img = "vertical_{}.jpg".format(i)
             imageList.append(join(tiffPath, img))

        imagePtr = pyvips.Image.new_from_file(imageList[0])
        for image in imageList:
            firstImage = imagePtr
            secondImage = pyvips.Image.new_from_file(image)
            print("merging image - ", image)
            imagePtr = secondImage.merge(firstImage, 'horizontal', firstImage.width, 0, mblend = 0)

        imgName = join(src_dir, slide.split("/")[0], imageNameList[slide])
        imagePtr.write_to_file(imgName + ".png")

Thanks in advance.

developer-harishkt avatar Nov 19 '19 05:11 developer-harishkt

Hi, libvips has a thing to do this, try:

$ vips arrayjoin "$(echo *.jpeg)" mypyramid.tif[tile,pyramid,compression=jpeg] --across 217

It should be quick.

You'll probably get the tiles in the wrong order, it depends on the exact naming scheme you have. You can write a little Python to generate the correct order pretty simply. You'll probably need to up the limit on the number of files a process can open (I use 65535).

Try it on a small mosaic first.

jcupitt avatar Nov 19 '19 06:11 jcupitt

Here's a longer answer I wrote:

https://github.com/openseadragon/openseadragon/issues/1363#issuecomment-373331572

That uses sort to get the tiles in the right order:

$ vips arrayjoin "$(ls *.jpeg | sort -t_ -k2g -k1g)" mypyramid.tif[tile,pyramid,compression=jpeg] --across 217

jcupitt avatar Nov 19 '19 06:11 jcupitt