pyvips
pyvips copied to clipboard
tif image covert to pyramidal tiff format
Hi, I have a TIFF image that the size is 54757x67953, I tried to use tiffsave( ) and write_to_file( ) to covert the image to pyramidal tiff format, but they all reported the same question:
LZWDecode: Corrupted LZW table at scanline 52134 tiff2vips: read error
Hi @InfinityBox,
I would guess your TIFF has been truncated, and libtiff is failing to decompress the final few strips. Could you share the image file?
Hi @InfinityBox,
I would guess your TIFF has been truncated, and libtiff is failing to decompress the final few strips. Could you share the image file?
It is about 5.5G, I don't know how to share it. But after your reply, I probably know the cause of the problem, which is the image truncation. Thank you!
As this issue is still open and the subject matches perfectly with my question, I hope you don't mind I reuse it.
I have always done this task with CLI version of libvips, by using the following syntax:
vips tiffsave "C:\i1.tif" "C:\i1.tif.p.tif" --compression jpeg --Q 50 --tile --tile-width 256 --tile-height 256 --pyramid
vips tiffsave "C:\i2.jpg" "C:\i2.jpg.p.tif" --compression jpeg --Q 50 --tile --tile-width 256 --tile-height 256 --pyramid
I want to translate that into pyvips
syntax.
Looking to other issues (#179, #289) I think the relevant functions are Image.new_from_file and Image.tiffsave. So I tried this:
import pyvips
images_list = ["C:\i1.tif", "C:\i2.jpg"]
for i in images_list:
image = pyvips.Image.new_from_file(i)
outputfile = i + '.p.tif'
image.tiffsave(outputfile, compression='JPEG', tile=True, tile_width=256, tile_height=256, pyramid=True)
-------------------------------------------------------------------------------------
pyvips.error.Error: no value JPEG in gtype VipsForeignTiffCompression (54418784)
pyvips: enum 'VipsForeignTiffCompression' has no member 'JPEG', should be one of:
none, jpeg, deflate, packbits, ccittfax4, lzw, webp, zstd, jp2k
-
I followed documentation which shows all those values in uppercase. Why so?
-
Are default parameter values (i.e.
tile_width
&tile_height
) documented somewhere? -
I am pretty new to Python, so I take the opportunity to ask if my code can be optimized (no idea if image must be closed somehow, for example). Specially because I have a very long list of files to be converted.
Many thanks in advance @jcupitt @abubelinha
Hi @abubelinha,
The main docs are here:
https://www.libvips.org/API/current/VipsForeignSave.html#vips-tiffsave
You need eg.:
image.tiffsave("x.tif", compression="jpeg", Q=50, tile=True, tile_width=256, tile_height=256, pyramid=True)
The big saving would be to enable sequential mode, so I'd use:
for filename in ["C:/i1.tif", "C:/i2.jpg"]:
image = pyvips.Image.new_from_file(filename, access="sequential")
image.tiffsave(f"{filename).p.tif", compression="jpeg", Q=50, tile=True, tile_width=256, tile_height=256, pyramid=True)
Note the forward (not back) slashes. You can use python multiprocessing to run several conversions at the same time, which will speed things up further.
Regarding uppercase, you can pass things like compression as enums or as strings. So eg.:
image.tiffsave("xx", compression=pyvips.enums.ForeignTiffCompression.JPEG)
image.tiffsave("xx", compression="jpeg")
Are equivalent. I find the strings more convenient, but some people prefer enums. Your IDE should autocomplete the enums as you type, so they aren't much more effort.
Thanks a lot for your really helpful comments @jcupitt !
Thanks a lot for your really helpful comments @jcupitt !
I am a bit surprised my (very simple) script memory consumption was more or less the same when using sequential mode. Also speed gain was not that huge (about 2.75%)
I probably don't know how to measure it correctly. I used memory_profiler like this:
Sequential:
C:\Python38\python -m memory_profiler dvd_vips.py
C:/Temp/1.tif [139398044 bytes] ---> C:/Temp/1.tif.pyr.tif [2446984 bytes]
C:/Temp/2.tif [180014936 bytes] ---> C:/Temp/2.tif.pyr.tif [4059742 bytes]
C:/Temp/3.tif [189432896 bytes] ---> C:/Temp/3.tif.pyr.tif [3983902 bytes]
C:/Temp/4.tif [199467632 bytes] ---> C:/Temp/4.tif.pyr.tif [4939036 bytes]
RUNNING TIME: 55.98920249938965
Filename: dvd_vips.py
Line # Mem usage Increment Occurences Line Contents
============================================================
10 25.742 MiB 25.742 MiB 1 @profile
11 def pyvipstest():
12 25.742 MiB 0.000 MiB 1 import os,time
13 25.746 MiB 0.004 MiB 1 start_time = time.time()
14 25.746 MiB 0.000 MiB 1 vipshome = config["vipshome"]
15 25.766 MiB 0.020 MiB 1 os.environ['PATH'] = vipshome + ';' + os.environ['PATH']
16 39.445 MiB 13.680 MiB 1 import pyvips
17 39.445 MiB 0.000 MiB 1 path = "C:/Temp"
18 39.445 MiB 0.000 MiB 1 images_list = os.listdir(path)
19 44.910 MiB -4.805 MiB 7 for i in images_list:
20 44.910 MiB -3.203 MiB 6 inputfile = path + '/' + i
21 44.910 MiB -3.203 MiB 6 outputfile = path + '/' + i + '.pyr.tif'
22 44.910 MiB -3.203 MiB 6 if os.path.isfile(inputfile):
23 44.992 MiB 0.789 MiB 4 image = pyvips.Image.new_from_file(inputfile, access="sequential")
24 44.910 MiB 3.070 MiB 4 image.tiffsave(outputfile, compression='jpeg', Q=70, tile=True, tile_width=256, tile_height=256, pyramid=True)
25 44.910 MiB -3.199 MiB 8 print("{} [{} bytes] ---> {} [{} bytes]" \
26 44.910 MiB -1.602 MiB 4 .format(inputfile, str(os.stat(inputfile).st_size) , outputfile , str(os.stat(outputfile).st_size)))
27 43.309 MiB -1.602 MiB 1 end_time = time.time()
28 43.309 MiB 0.000 MiB 1 print("RUNNING TIME: ", end_time - start_time)
Non sequential:
C:\Python38\python -m memory_profiler dvd_vips.py
C:/Temp/1.tif [139398044 bytes] ---> C:/Temp/1.tif.pyr.tif [2446984 bytes]
C:/Temp/2.tif [180014936 bytes] ---> C:/Temp/2.tif.pyr.tif [4059742 bytes]
C:/Temp/3.tif [189432896 bytes] ---> C:/Temp/3.tif.pyr.tif [3983902 bytes]
C:/Temp/4.tif [199467632 bytes] ---> C:/Temp/4.tif.pyr.tif [4939036 bytes]
RUNNING TIME: 57.56929278373718
Filename: dvd_vips.py
Line # Mem usage Increment Occurences Line Contents
============================================================
10 25.711 MiB 25.711 MiB 1 @profile
11 def pyvipstest():
12 25.711 MiB 0.000 MiB 1 import os,time
13 25.715 MiB 0.004 MiB 1 start_time = time.time()
14 25.715 MiB 0.000 MiB 1 vipshome = config["vipshome"]
15 25.734 MiB 0.020 MiB 1 os.environ['PATH'] = vipshome + ';' + os.environ['PATH']
16 39.613 MiB 13.879 MiB 1 import pyvips
17 39.613 MiB 0.000 MiB 1 path = "C:/Temp"
18 39.613 MiB 0.000 MiB 1 images_list = os.listdir(path)
19 43.621 MiB -0.668 MiB 7 for i in images_list:
20 43.621 MiB -0.668 MiB 6 inputfile = path + '/' + i
21 43.621 MiB -0.668 MiB 6 outputfile = path + '/' + i + '.pyr.tif'
22 43.621 MiB -0.668 MiB 6 if os.path.isfile(inputfile):
23 42.906 MiB 0.301 MiB 4 image = pyvips.Image.new_from_file(inputfile)
24 43.621 MiB 3.512 MiB 4 image.tiffsave(outputfile, compression='jpeg', Q=70, tile=True, tile_width=256, tile_height=256, pyramid=True)
25 43.621 MiB -3.926 MiB 8 print("{} [{} bytes] ---> {} [{} bytes]" \
26 43.621 MiB -0.668 MiB 4 .format(inputfile, str(os.stat(inputfile).st_size) , outputfile , str(os.stat(outputfile).st_size)))
27 43.621 MiB 0.000 MiB 1 end_time = time.time()
28 43.621 MiB 0.000 MiB 1 print("RUNNING TIME: ", end_time - start_time)
That was a small set of 4 tiff scanned images (A3 sized, 400 dpi, between 139 and 200 MB size, 6500x10000 pixels).
Surprisingly for me, if I add a set of 16 small jpeg mobile phone photographies (<1MB size, 1488x1488 pixels) to my C:/Temp folder and re-run the same script, then I see a big difference:
- sequential access finishes in 54.36 seconds, max. mem usage 51.246 MB
- non-sequential finishes in 79.13 seconds, max. mem usage 139.297 MB
So, a couple of questions:
- sequential: How is it possible that adding more work (16 jpeg images plus the previous 4 tiff), the script runs faster? (54.36 vs previous 55.98)
- non sequential: So it's a much harder work converting small jpegs than big tiffs?
Thanks in advance, and sorry about my newbie questions.
It's to do with the way libvips opens files. There's a chapter in the docs, if you've not seen it:
https://www.libvips.org/API/current/How-it-opens-files.md.html
It explains what seq mode does and how if affects speed and memory use.
I usually benchmark like this:
#!/usr/bin/python3
import sys
import pyvips
for filename in sys.argv[2:]:
image = pyvips.Image.new_from_file(filename, access=sys.argv[1])
image.tiffsave(f"{filename}.p.tif",
compression="jpeg",
Q=50,
tile=True,
tile_width=256,
tile_height=256,
pyramid=True)
Then with a large JPEG image:
$ vipsheader ~/pics/st-francis.jpg
/home/john/pics/st-francis.jpg: 30000x26319 uchar, 3 bands, srgb, jpegload
$ /usr/bin/time -f %M:%e ./convert-pyr.py random ~/pics/st-francis.jpg
421388:12.28
$ /usr/bin/time -f %M:%e ./convert-pyr.py sequential ~/pics/st-francis.jpg
468164:8.69
The two numbers are peak memory use in kb and elapsed time in seconds.
But there are a couple of problems: the memory use is not including the temporary file that libvips has to make for random access mode, and there's very little parallelism here, so the libvips threadpool actually makes things slower.
I would turn off threading, and force it to keep the temporary file in memory:
$ VIPS_DISC_THRESHOLD=-1 VIPS_CONCURRENCY=1 /usr/bin/time -f %M:%e ./convert-pyr.py random ~/pics/st-francis.jpg
2424840:10.24
$ VIPS_DISC_THRESHOLD=-1 VIPS_CONCURRENCY=1 /usr/bin/time -f %M:%e ./convert-pyr.py sequential ~/pics/st-francis.jpg
156688:7.72
So now it's 2.4gb for random, 150mb for seq, and seq is about 25% faster.
(this PC has 32 cores, the threading overhead will often be less)
Hi, I am trying to convert a WSI with scn format to ndpi format but gets this error:
raise Error('unable to write to file {0}'.format(vips_filename)) pyvips.error.Error: unable to write to file b'H14147-08A HES_2015-07-29 13_00_25.ndpi' VipsForeignSave: "H14147-08A HES_2015-07-29 13_00_25.ndpi" is not a known file format
Here is the code: current_img_400x = vips.Image.new_from_file(os.path.join(directory, fname), level=0, autocrop=True) current_img_400x.cast("uchar")[0:3].write_to_file(f"{new_fname}.{format_to_save}", tile=True, compression="jpeg", pyramid=True)
Any suggestion on how to do it? I am trying to load the same slide using OpenSlide but it takes some extra white space and creates trouble with masks. PyVIPS autocrop is something I am not able to find with the OpenSlide object.
Hi, libvips can't write .ndpi
files, it can only read them.
You could make a standard pyramidal TIFF, would that work?