pyvips
pyvips copied to clipboard
Determine Byte Order of Current Image
I am wondering how to determine the byte order of the currently loaded image using PyVIPS?
I noticed that libvips appears to offer a suitable method for determining byte order, vips_image_isMSBfirst, however, it does not appear that this method is currently accessible in PyVIPS for use from Python.
Is there another way to determine the byte order of the currently loaded image in Python, or would it be at all possible for a bridging method to be added to pyvips.Image that calls the underlying C function vips_image_isMSBfirst and returns True for MSB ordered images and False otherwise?
I believe this is the appropriate method to determine byte order in the C API:
https://www.libvips.org/API/current/method.Image.isMSBfirst.html
gboolean
vips_image_isMSBfirst (
VipsImage* image
)
Thank you kindly for any advice or help you can offer.
Hi @bluebinary,
libvips records the byte order of the writer, then when reading images back, will byte-swap if the order doesn't match native order. So (hopefully!) you shouldn't need to know.
What's the use-case?
Hi @jcupitt, thank you so much for your quick reply and for all your work on VIPS over the years; it is such an important tool for the community.
The use case is decoding raw EXIF, IPTC, and XMP metadata payloads obtained via pyvips.Image.get(), and when generating updated versions of the payloads, being able to match the byte order used by the image. This is to support reading from and embedding collections-related metadata into the images from our Museum, Research Library and Archival Collections.
I was hoping that there would be a way to determine the byte order via a call to VIPS to save opening the image file separately to parse its header and magic number. We are almost exclusively dealing with Pyramidal Big TIFF files for this use case.
Could it be the case that VIPS also automatically accounts for the raw metadata payloads held in these fields and adjusts them when they are retrieved via .get() or updated via .set_type(), and if so, which byte order does the library default to?
Thank you kindly for any advice you can offer.
Ah I see, I hadn't thought of the metadata. Don't decode libraries for those blobs handle byte order for you? I think libexif does, XMP shouldn't matter, I've no idea about IPTC.
The libvips automatic byteswapping is only for vips format files (.v), we rely on the file format libraries (like libtiff) for byte order handling in other formats. This means vips_image_isMSBfirst() probably won't help you.
libtiff has functions like:
http://www.simplesystems.org/libtiff//functions/TIFFquery.html#c.TIFFIsByteSwapped
Maybe that could work?
Thank you for the clarification, and for the libtiff library function recommendation.
The TIFF and JPEG metadata parsing libraries I have used or have written have relied on accessing raw image data to inspect the magic number to determine byte order. I am not sure if the raw file header information is available to a user once the image has been loaded into VIPS via say libtiff if the byte order conversion is being handled automatically?
I have a related discussion question regarding embedding metadata into pre-existing JPEG-compressed PTIFF files, and needing to do so repeatedly over time, and I wished to understand if there may be a potential impact on image quality if the only operation being performed between opening and saving the image was to update one or more of its metadata payloads:
https://github.com/libvips/libvips/discussions/4567
I may have overlooked something in the documentation, but I wished to understand if the approach I was hoping to use made sense, and how we could avoid possible accumulation of JPEG-compression related artefacts through the repeated image read and save cycles.
No, libvips doesn't record the byte order reported by TIFF loader. I think you'd need to call libtiff directly to find that out.
We could add a libvips metadata field to note this, perhaps tiff-byteswapped, meaning the libtiff loader that made this image detected non-native byte order. But of course the problem then is when to turn this flag off! It'll become incorrect after a while, it's not clear to me exactly when this would be, and a flag that's unreliable is probably worse than no flag.
libvips will always decode and recode pixels (it's an image processing library, not a metadata library), so yes, JPEG errors will accumulate.
TIFF doesn't let you modify metadata in a file, so you always have to read the whole file and write it all back again.
I think your choices are:
-
Find a TIFF metadata edit library, or build your own, perhaps on top of
tifffile, the python one. You should be able to losslessly update image metadata, though it might well be a bit slow, especially for large files. You'd need to do some benchmarking to see if it met your needs. -
You could use libvips, but it would decode and recode pixels, so errors would accumulate. Some libjpeg implementations aim to minimise decode / recode errors (eg. the one in
libjxl, I think), so picking one of these would help a lot. Picking the correct Q factor would be important (libvips does not do this for you!). Again, you'd need to do some experiments. -
It's difficult to update metadata in TIFF, so if it's likely to change, it's much better to store it somewhere else. For example, you could store
filename.xmlalongsidefilename.tifand put the XMP in that.
Find a TIFF metadata edit library, or build your own, perhaps on top of tifffile, the python one
For updating the metadata within a TIFF without touching the raw pixel data, I can also recommend the tifftools (https://github.com/DigitalSlideArchive/tifftools) python library. It gives you low-level access to the metadata and allows efficient and clean injection/updating of metadata (at even the individual IFD or SubIFD level if you need that) without touching the raw pixel data. tifftools will, of course, also tell you the byte order of your TIFF.
Alternatively, if you just need a simple command line tool, there's always the libtiff tiffset utility. This is handy for updating basic tags such as ImageDescription or Resolution, but may not work as well for IPTC blobs etc.