Very slow write_to_file in pyvips
Hey all, I am using pyvips library for loading thumbnails from raw data, resizing them, and saving them to jpg. The loaded thumbnail is of size 8192x5464, resizing them to have max dim=1024, then storing them to jpg.
I was comparing the speed of pyvips for the above-mentioned tasks with PIL from the following code:
FOR PYVIPS:
for idx, filepath in tqdm(enumerate(filepaths)):
with rawpy.imread(filepath) as raw:
thumb = raw.extract_thumb()
if thumb.format == rawpy.ThumbFormat.JPEG:
thumb_rgb = pyvips.Image.new_from_buffer(BytesIO(thumb.data).read(), "")
w,h = thumb_rgb.width, thumb_rgb.height
max_d = max(thumb_rgb.width, thumb_rgb.height)
size = 1024
if max_d > size:
ratio = max_d/size
thumb_rgb = thumb_rgb.shrink(ratio, ratio)
w,h = thumb_rgb.width, thumb_rgb.height
thumb_rgb.write_to_file(os.path.join("{}.jpg".format(idx))
FOR PIL:
for idx, filepath in tqdm(enumerate(filepaths)):
with rawpy.imread(filepath) as raw:
thumb = raw.extract_thumb()
if thumb.format == rawpy.ThumbFormat.JPEG:
thumb_rgb = Image.open(BytesIO(thumb.data))
w,h = thumb_rgb.size
max_d = max(thumb_rgb.size)
size = 1024
if max_d > size:
ratio = size / max_d
thumb_rgb = thumb_rgb.resize((int(ratio * s) for s in thumb_rgb.size), Image.Resampling.LANCZOS)
thumb_rgb.save('{}.jpg'.format(idx))
Unfortunately, the pyvips version is slower than pil version. If I remove the save part from both versions, i.e last lines, then pyvpis part is 2x faster than pil. So there seems to be some issue with the way I am saving pyvips to jpg. I have tried different ways but no luck. Please guide me in this.
Thanks in advance!
Hi @nitishsaDire,
You need thumbnail_buffer. Try something like:
for idx, filepath in tqdm(enumerate(filepaths)):
with rawpy.imread(filepath) as raw:
thumb = raw.extract_thumb()
thumb_rgb = pyvips.Image.thumbnail_buffer(thumb, 1024, size="down")
thumb_rgb.write_to_file(f"{idx}.jpg")
edit sorry, I missed the part about reading from RAW images
I don't have a RAW image with an embedded JPEG thumbnail, so I've not been able to test this.
Hi John, thanks for your swift reply. Appreciated.
With your suggestions, and thumb_rgb = pyvips.Image.thumbnail_buffer(BytesIO(thumb.data).read(), 1024, size="down"), I could see a 2.5x gain in speed. Awesome.
But in the case of multiprocessing, this speedup is not scaling. What I am doing is trying to launch say 4 such processes. and was expecting ~6-8x gain compare to 4 processes using pil. But unluckily in the case of multiprocessing, the speeds of both pyvips and pil are ~ same.
I have tried multi-threading and multiple script instances. Is there any way to maintain the pyvips speedup with multiprocessing?
Thanks in advance.
ps: my system has 8 cores and 8 gigs mem.
Yes, it should speed up well. For example:
$ for i in {1..100}; do cp ~/pics/wtc.jpg $i.jpg; done
$ vipsheader 1.jpg
1.jpg: 9372x9372 uchar, 3 bands, srgb, jpegload
$ time vipsthumbnail 1.jpg
real 0m0.187s
user 0m0.171s
sys 0m0.016s
$ time parallel vipsthumbnail ::: *
real 0m1.083s
user 0m22.778s
sys 0m2.529s
So one 9372 x 9372 jpeg image takes 190ms, and 100 jpegs take 1100ms -- that's a 20x speedup with multiprocessing. This PC has 16 cores and 32 threads.
Or in python:
#!/usr/bin/python3
import os
import multiprocessing
import sys
import pyvips
def thumbnail(directory, file):
thumb = pyvips.Image.thumbnail(f"{directory}/{file}", 128)
thumb.write_to_file(f"{directory}/tn_{file}")
def all_files(path):
for (root, dirs, files) in os.walk(path):
for file in files:
yield root, file
with multiprocessing.Pool() as pool:
pool.starmap(thumbnail, all_files(sys.argv[1]))
I see:
$ time ./multi.py samples/
real 0m1.031s
user 0m21.292s
sys 0m1.125s