pyvips write_to_file takes more than an hour to write images(10+ GB in size) to disk
Hi, jcupitt & others,
I have been facing an issue, quite similar to this, my task to convert base_tiles i.e jpeg's used for dzi format into multi-tiled pyramidal tiffs.
The Process:
For example, jpeg starts from 0_0.jpeg to 146_216.jpeg, computing of 147 rows & 217 columns.
- I first stitch all the columns of every row as vertical_0.png to vertical_146.png. I convert these images using PIL & numpy.vstack().
- Once I have these, I use pyvips to merge all the verticals into a single png image, which is summing up to 11+ GB in size, & it is taking more than an hour to convert.
- Once that is converted to a single png I convert it into tiff using vips tiffsave command to multi-tiled pyramidal tiff.
It would be great & really really helpful if someone can help me to speed up this process, or provide me an approach in which I can achieve this much faster.
My Code:
To Generate vertical images
for j in range(start, end):
imageList = []
for i in range(numberOfColoums):
img = "{}_{}.jpeg".format(j, i)
imageList.append(join(basePath, img))
list_im = imageList
imgs = [ Image.open(i) for i in list_im ]
imgs_comb = np.vstack( (np.asarray(i) for i in imgs ) )
imgs_comb = Image.fromarray(imgs_comb)
img_name = 'vertical_{}.jpg'.format(j)
imgPath = join(tiffPath, img_name)
imgs_comb.save(imgPath)
print("{} Written".format(imgPath))
To generate a whole image png:
for i in range(numberOfRows+1):
img = "vertical_{}.jpg".format(i)
imageList.append(join(tiffPath, img))
imagePtr = pyvips.Image.new_from_file(imageList[0])
for image in imageList:
firstImage = imagePtr
secondImage = pyvips.Image.new_from_file(image)
print("merging image - ", image)
imagePtr = secondImage.merge(firstImage, 'horizontal', firstImage.width, 0, mblend = 0)
imgName = join(src_dir, slide.split("/")[0], imageNameList[slide])
imagePtr.write_to_file(imgName + ".png")
Thanks in advance.
Hi, libvips has a thing to do this, try:
$ vips arrayjoin "$(echo *.jpeg)" mypyramid.tif[tile,pyramid,compression=jpeg] --across 217
It should be quick.
You'll probably get the tiles in the wrong order, it depends on the exact naming scheme you have. You can write a little Python to generate the correct order pretty simply. You'll probably need to up the limit on the number of files a process can open (I use 65535).
Try it on a small mosaic first.
Here's a longer answer I wrote:
https://github.com/openseadragon/openseadragon/issues/1363#issuecomment-373331572
That uses sort to get the tiles in the right order:
$ vips arrayjoin "$(ls *.jpeg | sort -t_ -k2g -k1g)" mypyramid.tif[tile,pyramid,compression=jpeg] --across 217