brunnhilde
brunnhilde copied to clipboard
Feature: Report more accurate MAC dates
Connected to https://github.com/tw4l/brunnhilde/issues/53
Brunnhilde/Siegfried report on the file created and modified dates as they are in the file system where files are being scanned. Sometimes files contain more accurate timestamps within their internal metadata. If such dates are found, we should also report on or even prefer these dates, as they are likely to be more useful for an archivist.
I think that these embedded dates are indeed better than the file system dates, but there are some edge cases where they're just as misleading. I'm currently dealing a lot with the following scenario:
- RAW photograph has an embedded date of creation that reflects when the photograph was taken
- the photographer made a colour corrected, cropped version in JPEG form that also has an Exiftool SubSecDateTimeOriginal which is the same date as the RAW photography date.
-
- In the JPEG example, there will be another date created present which will reflect the date of export from photoshop, when the colour correction and cropping was finished.
- In a sense, the date exported is arguably the true date of creation for the JPEG, as this is when the edits were created, though the date that the opriginal photograph was taken is also valuable.
So perhaps just reporting all the dates is potentially the best way to go, and leave it up to the user to perform the detective work. This is why I think that the dateCreatedByApplication value in some PREMIS/METS files can't be automated too well.
A related issue to consider with MAC dates is file timestamps not being preserved when files are carved from a disk image, either by tsk_recover or the UDF mount-and-copy routine.
I just encountered another relevant use case: ePADD can extract all email attachments from an email archive, for example, mbox. These attachments all have the date of extraction as their file system metadata, and a brunnhilde report on the attachments does not have the correct time span as a result. if it was possible for brunnhilde to be able to detect other types of embedded datetime values, it could provide more meaningful time spans.
I acknowledge that this would probably involve scanning the files with tika/mediainfo/exiftool etc as well so it's a huge task.