jhove
jhove copied to clipboard
How to deal with Unknown TIFF IFD tag INFO messages?
At the National Archives of the Netherlands we're ingesting many TIFF scans. JHOVE reports several INFO messages about Unknown TIFF IFD tag: [number] per scan. Times thousands, the result is that we're seeing very large log files (in our Preservica EE-based repository) and long-running log file analysis scripts. How best to limit the amount of messages?
An example of a publicly available TIFF file can be found here: https://www.nationaalarchief.nl/onderzoeken/archief/1.05.11.13/invnr/230/file/NL-HaNA_1.05.11.13_230_0002 (direct download link: https://service.archief.nl/gaf/api/file/v1/original/2c1e155a-696c-4c6d-9468-d3d6ab2a1ec4). This file reports as [number] in JHOVE 1.24.1: 36868, 37510, 40091-5, 40961 and 65001.
Should JHOVE include all known public and private TIFF IFD tags (see e.g. https://www.awaresystems.be/imaging/tiff/tifftags.html and https://www.loc.gov/preservation/digital/formats/content/tiff_tags.shtml)? Or can we remove these messages from JHOVE's output using parameters. Or...?
We are looking at implementing a filter system for messages/validation to allow users to ignore messages based on a config file with IDs. More details to follow.