OpenImageIO icon indicating copy to clipboard operation
OpenImageIO copied to clipboard

[FEATURE REQUEST] More complete camera RAW metadata available through OIIO

Open demoulinv opened this issue 1 year ago • 19 comments

Extract metadata related to Color Temperature, Focus range and Auto Lens Correction status

Would it be possible to make available through OIIO specific metadata useful for color management and lens correction. For a given image, those metadata are made available by ExifTool but cannot be read through OIIO. The attached files show what can be read with OIIO and ExifTool for 2 raw images (Canon CR2 and Sony ARW). BCAM7025_Canon_OIIO_metadata.txt BCAM7025_Canon_ExifTool_metadata.txt DSC01727_Sony_OIIO_metadata.txt DSC01727_Sony_ExifTool_metadata.txt All metadata related to Color Temperature, Focus range and Auto Lens Correction status would be very helpful for raw color management and lens distortion correction. For example, in the attached Canon exifTool file they appear as: "Color Temperature", "Focus Distance Upper", "Focus Distance Lower", "Peripheral Lighting", "Distortion Correction", "Chromatic Aberration Corr". Another option would be to interface OIIO with ExifTool and to deliver the ExifTool metadata through OIIO.

demoulinv avatar Aug 30 '22 07:08 demoulinv

Can you provide the CR2 and ARW images so we can see what additional data we are missing and if there's any we can get to it? Any samples will do, but preferably something where the image is not proprietary and at the lowest resolution you can make, so it's permissible content and reasonable size to check in as a test case.

OIIO by itself is pretty good at the straight-up Exif, but it's the data in the "maker info" -- which tends to be quite different for every camera brand and model -- that is really tricky. The core problem is that we rely on libraw to read the camera raw images, including this metadata, and libraw's decoding of all the fields is both sometimes spotty, as well as changing from version to version (so merely building against an older libraw may make OpenImageIO unable to decipher some of that metadata that it would understand if it was built against the latest libraw).

There's a new ASWF project, https://github.com/AcademySoftwareFoundation/rawtoaces, which currently I think also uses libraw underneath, but if I understand correctly, there has been talk of whether it should bypass libraw and do the work itself in order to overcome the many shortcomings of libraw. If that were to come to pass, I could foresee a day when OIIO just uses rawtoaces instead of libraw, if it's not losing functionality in the process. Maybe @kdt3rd would like to comment, I don't want to misrepresent the goals of that project.

I like the idea of OIIO relying on some 3rd party library for this, because if OIIO tried to fully implement its own support for deciphering the constantly changing array of camera raw formats, well, that's potentially its own high-labor task on par with the complexity of the rest of OIIO combined.

So I'd prefer to outsource it to a dependency, but if somebody wanted to propose just incorporating that functionality directly into OIIO and controlling our own destiny here, I would be amenable to that as well -- provided that the proposal include specific people who are willing to take charge of that task and its continued maintenance. It can't just be "oh, Larry should do that, too," because I just don't have the bandwidth or expertise or access to the camera to do an adequate job of it.

lgritz avatar Aug 30 '22 19:08 lgritz

Hi Larry, Thank you for your quick reply. I understand your preference to outsource this task to a dependency entirely dedicated to this. My understanding is that today you can quite easily pass all metadata extracted by Libraw. When I look at LibRaw, I don't see this information among the extracted ones. I opened an issue asking if they could be added https://github.com/LibRaw/LibRaw/issues/488. But it seems to me that today exiftool is the tool providing the most complete set of metadata. Would you consider providing in OIIO an interface with this tool in order to obtain and deliver the metadata associated with the decoded image?

demoulinv avatar Sep 01 '22 07:09 demoulinv

exiftool is a perl program, not a library directly callable from C++, and also it's licensed under GPL, so we would not be able to directly incorporate any part of its source code into OIIO.

I suppose that our raw reader could check if exiftool was available and if so, call exiftool on the command line (that would not run afoul of the GPL terms since it would not be bundled and would be running in a separate execution process), capture the output, parse it, and use it to augment the metadata we gleaned from reading the file ourselves via libraw (which we still have to do to get the pixel values). Do we do this always? Only for the particular formats in which libraw seems to be especially incomplete?

That seems very tricky to orchestrate. And maybe very sensitive to any changes in the formatting of exiftool's output from version to version. It seems like in the grand scheme of things, it would be better to patch libraw to make sure to retrieve all of the data that is currently missing.

lgritz avatar Sep 01 '22 19:09 lgritz

Very interested to get @kdt3rd's opinion here, inasmuch as it may relate to rawtoaces.

lgritz avatar Sep 01 '22 19:09 lgritz

After the merge of #3561, I think this issue can be closed.

demoulinv avatar Oct 24 '22 07:10 demoulinv

What about using this? https://exiftool.org/cpp_exiftool/

We currently use exiftool and add all the keys to the EXR when processing raw with OIIO. It would be nice not to need to do that.

The rawtoaces which replace Libraw seemed quite far away when we met at Siggraph unless something has changed. Also, ASWF having to reverse engineer every single raw format for every camera seems like a high bar to meet. Even Adobe & Apple have to reverse engineer them with no help from Nikon, Canon, Sony, etc...

dekekincaid avatar Oct 24 '22 07:10 dekekincaid

I didn't know about cpp_exiftool! Though it's already off on the wrong foot with their license information, "This is software, in whole or part, is free for use in non-commercial applications, provided that this copyright notice is retained in the code. A licensing fee may be required for use in a commercial application. Contact Phil Harvey for more information." So that seems problematic, since OIIO is embedded in all sorts of commercial apps.

However, if understand the docs correctly, cpp_exiftool is just a thin wrapper to launch exiftool on the command line and then parse its output. We could just as easily do that ourselves.

The regular exiftool has a different license, a GPL variant (matching Perl's license with slightly relaxed rules), and I think that it's agreed that although linking (in the same process or address space) would make a derived work that would impose GPL conditions on all the parts, in the case of launching a command line app, that's a separate process, so I don't think it would create a conflict between exiftool's GPL and OIIO's BSD licenses.

So I think it would not be very difficult to modify our rawinput.cpp to see if exiftool was available, and if so, launch an exiftool run on the same file, open a pipe to its output, parse it, and insert that data into the ImageSpec. This approach comes with some tradeoffs:

Pros:

  • Outsource this functionality so we don't have to stay on top of reverse engineering camera formats.
  • By all accounts, exiftool has better decoding and coverage of all camera data than libraw appears to, and libraw's authors don't appear to want to expand that functionality.
  • Straightforward to check exiftool's existence, launch it, and capture its output.

Cons:

  • Additional dependencies -- the functionality would not built into OIIO, but would require shelling out to exiftool, as well as adding that as an implied dependency (and, in turn Perl, because exiftool is implemented as a Perl script!).
  • Performance -- in addition to shelling out to launch a separate program (Perl no less), it essentially means reading from the file TWICE (once for OIIO, via libraw, still needed to read the image itself, and a second time for exiftool to read the metadata).
  • Flexibility -- it might be limited to files on disk and make future use of IOProxy difficult. On the other hand, our current raw input doesn't support IOProxy either, and I don't remember if it's possible with libraw either.

lgritz avatar Oct 24 '22 22:10 lgritz

I might take a stab at prototyping this tonight if I have time.

lgritz avatar Oct 24 '22 22:10 lgritz

@dekekincaid I got a chance to try this and it works very well. I'll finish it up and submit a PR soon.

There is an interesting design choice to make, though. There's the metadata we get from libraw, and that which we get from exiftool. Some of them overlap, of course. Most that overlap match, but some do not. (And I have little idea how to decide which is right.) So I'm curious about your opinion on which of these should be our behavior:

  1. Collect metadata from libraw, then exiftool, but ignore exiftool metadata that has the same name as metadata we already have from libraw. (That is, exiftool augments what we have with additional metadata, but doesn't replace or change anything we got from libraw, or what we would report if exiftool was not found. In other words, exiftool-enabled reporting is always a strict superset of what we would report if exiftool was not found.)
  2. Like (1), but the exiftool always overwrites identically-named data from libraw Collect metadata from libraw, then exiftool, but ignore exiftool metadata that has the same name as metadata we already have from libraw. (That is, when they conflict, believe exiftool. This means that metadata may change, not merely be added, when exiftool-enabled.)
  3. When exiftool is available, use it exclusively for the camera metadata, and ignore what libraw has to say about the metadata (except the parts related directly to the decoding of raw pixels, obviously).

Needless to say, this is only for when exiftool is found. When it is not, we use the libraw data as we always did before.

Thoughts, anybody?

lgritz avatar Oct 26 '22 21:10 lgritz

I should note that I'm still weirded out by using a shelled-out external command line dependency and reading its output to get metadata from raw files, in ways that I am not by relying on an ordinary linked library dependency. That may not be rational, maybe I should get over it, but it does still make me hesitate a bit about urging people to rely on this.

lgritz avatar Oct 26 '22 21:10 lgritz

@kdt3rd Do you have any thoughts on this whole topic of raw metadata?

lgritz avatar Oct 26 '22 21:10 lgritz

internally we take the libraw and then overlay the exiftool ones over those, so 2. When they overlap we found the exiftool ones more descriptive and more reliable.

For 3, I do not think that would work because there are some which only libraw sees that exiftool does not.

dekekincaid avatar Oct 26 '22 22:10 dekekincaid

OK, I'll go with your recommendation since you have already explored this space.

lgritz avatar Oct 26 '22 22:10 lgritz

@dekekincaid Are there exiftool command line arguments you particularly like? I've noticed that some of them can change the representation of the metadata a bit.

lgritz avatar Oct 26 '22 23:10 lgritz

@lgritz it appears all our code that uses exiftool uses the command: exiftool -fast -s -G -t -m -q -q

dekekincaid avatar Oct 26 '22 23:10 dekekincaid

Hmmm, I'm not sure if -n is good or bad. Without it, what exiftool reports is the data with some rounding and interpretation/explanation (it's a string) But with -n, it's the raw data (which I can give as the correct type in most cases). In many cases, oiio already can augment the raw data when printing with some human-readable interpretation, so having exiftool do it may be redundant and also precludes getting at the original data.

So a question is: to what extent do you want the metadata to be ACTUAL data in the file (like the float value 3.14159), versus a wordy string like "3.14 mm gear tooth size"? Is the only value of the camera maker data to pretty print it, or will you ever programmatically want the real data that was in the file?

lgritz avatar Oct 26 '22 23:10 lgritz