metadata-extractor icon indicating copy to clipboard operation
metadata-extractor copied to clipboard

MP4 time stamps still a muddle

Open jefftucker1952 opened this issue 4 years ago • 22 comments

With v2.13, the timestamps returned by M-E are better than they were, but are still a bit of a muddle.

Let's start with one important fact: AFAIK, there is only one piece of time information in the metadata of a standard MP4, and that's the time the video was created, in GMT. There is no localization information in the file.

Take an example: (file removed). The file name tells part of the story - July 22, 2017, at 13:36:49, local time. Summer in the Alps, so Central European time, daylight saving time in effect. But the only information in the metadata is the creation time in GMT: 2017:07:22 11:37:03 (it took the phone a few seconds to stash the video in its memory). That's the only number that is of any use to anyone.

But M-E is telling me the following (among other things):

[MP4] Creation Time - Sat Jul 22 07:37:03 EDT 2017 [MP4 Video] Creation Time - Sat Jul 22 07:37:03 -04:00 2017 [MP4 Sound] Creation Time - Sat Jul 22 07:37:03 -04:00 2017

It seems to think that the fact that I'm currently sitting in Connecticut (GMT-5, daylight time not in effect) is somehow relevant. But it's not. It would be like pulling the exposure time from a JPG, realizing that the user is currently traveling close to light speed, and telling him how to adjust for relativistic time dilation. It just doesn't matter - the only useful piece of information is the exposure time recorded by the camera. The same is true here - what time it was in Connecticut when that video was shot is a useless data point.

True, the application could derive the correct GMT timestamp from the information provided, but it shouldn't have to. M-E should be returning only the information it has:

[MP4] Creation Time - Sat Jul 22 11:37:03 2017 [MP4 Video] Creation Time - Sat Jul 22 11:37:03 2017 [MP4 Sound] Creation Time - Sat Jul 22 11:37:03 2017

Instead, it's trying to do me a "favor."

jefftucker1952 avatar Jan 21 '20 20:01 jefftucker1952

From what I can see, the reported dates are correct. I'm curious why EDT is used for the first date -04:00 for the others, but from what I understand they mean the same?

That said, Sat Jul 22 07:37:03 -04:00 2017 and Sat Jul 22 11:37:03 UTC/+00:00 2017 specify the exact same time. Since Java Date instances are all stored as UTC, the conversion to a specific time-zone is done at the time when the Date is converted into a String. If you merely use toString(), your Java installation will "make the choice for you" - if you disagree with the choice, you must explicitly format it with the time zone you want, for example using a DateTimeFormatter.

Thus, your goal of actually finding the local light conditions and automatically compensating for this seems impossible unless you have the location/time-zone information from another source than the MP4 metadata. If you have this information on the other hand, it should be easy to format the date for that time zone.

Nadahar avatar Jan 21 '20 20:01 Nadahar

Sorry if I seem obtuse, but what does EDT have to do with the time stamp on that MP4?

This is like having the aperture on a JPG that's f5,6 but having M-E tell me that it's f2,8 EDT, and if I don't like that, I can do my own conversion.

The local time zone of the user is not relevant. Period. Why bother reporting it at all? And if my application has other information with which it can determine the actual local time (like grabbing the file name in this case, though I naturally wouldn't do that - too many chances to screw it up), the EDT information is of no value. Ever. All I need from M-E is, literally, 2017:07:22 11:37:03. If my application wants to change the metadata to 2017:07:22 13:37:03 to reflect the local reality, it certainly doesn't want M-E then turning around and second-guessing what I've done.

I hate to mention the competition, but exiftool gets this right, and has for years.

jefftucker1952 avatar Jan 21 '20 21:01 jefftucker1952

You're missing an essential issue. Exiftool is an application, although a command line one. It formats the output for the end user. Metadata-extractor is a Java library which can't be used by end users directly. You need a (Java) application to operate Metadata-extractor's API to use it. Metadata-extractor simply provides a Java Date object, which is by definition in UTC (it has no time-zone information and date and time values are always stored in UTC). It is the application you use, whatever that is, which converts the Date into a String (a text representation), and this is where the time-zone conversion takes place. So, you're "barking up the wrong tree".

If somebody use Metadata-extractor to make a command line tool, like Exiftool, that command line tool would be responsible for not applying the local time-zone when reporting dates.

Nadahar avatar Jan 21 '20 21:01 Nadahar

I think the underlying problem is that the time stamp in an MP4 isn't really a Date object, or at least can't be treated like one. Unlike the EXIF date in a JPG, it creates its own reality. It wouldn't be so bad if you could easily add your own "real date and time" to the metadata of an MP4, but that appears to be a tall order, especially since they don't seem to support xmp, which would be the ideal solution.

Where I'm coming from is that I have users who want to display the time stamp on their videos in their web albums. The metadata in the MP4 doesn't contain enough information to cough up the real date and time. Only the user can supply the missing info ("I was in the Fextal!"). But then there's no way for him to plug the real date and time back into the MP4 for future use. If he changes the existing timestamp, the next time he uses that video in a project, M-E is going to assume that it still contains GMT when, in fact, it's been "corrected" by the user.

Theory is all very nice, but the real world is looking for something else.

jefftucker1952 avatar Jan 21 '20 21:01 jefftucker1952

Your criticism here is really that of the MP4 metadata themselves, specifically the lack of time-zone information. If you read the relevant part of the MPEG-4 standard that I referred to here: https://github.com/drewnoakes/metadata-extractor/issues/408#issuecomment-512618627

..you will see that the standard dictates that the timestamp in the metadata must be in UTC. So, if a user "corrects" the timestamp by compensating for the time-zone, he or she is really "corrupting" the information. I don't understand how Metadata-extractor could do anything other than "assume" that it's in UTC, since that's what it is by definition.

The real problem is the lack of time-zone/geographical information in the MP4 metadata.

Nadahar avatar Jan 21 '20 21:01 Nadahar

Yes, I almost can't believe that they screwed that up. My camera knows what time it is, and my phone certainly does. Neither one has any problem recording the local time on a JPG. Standards-writing by committee.

jefftucker1952 avatar Jan 21 '20 21:01 jefftucker1952

I browsed a bit in the standard, and it turns out it's more complicated. From what I understand, the current information is retrieved from the "Movie Header Box" (8.2.2) with code mvhd. This "box" is mandatory and is always present in video files. Think of it as the "most basic metadata" in a MP4 file.

In addition, a whole host of other "boxes" can be included with further metadata in the meta box. This is optional, so it will depend on the encoding application/device what information, if any, is provided. In there, a lot of different metadata can be stored, even complete XML documents.

So, it seems that there absolutely is support in the standard to store further information. The problem is that this seems to be optional, so one would need to figure out what information is "usually" available in actual encoder/muxer implementations. I don't know enough about MP4 to know this, but it might be that a "box" could be found where this information can usually be found. I don't think Metadata-extractor supports very many box types as of now though, I think only the "basic boxes" are implemented.

So, if someone puts enough effort into researching this, it might turn out that there is a way one can, usually, figure out the time-zone.

Nadahar avatar Jan 21 '20 22:01 Nadahar

At a minimum, I'd suggest that a single format be adopted for the six common dates - MP4 Creation, MP4 Modification, MP4 Video Creation, MP4 Video Modification, MP4 Sound Creation, and MP4 Sound Modification. The first two are using the clumsy EDT notation, which satisfies absolutely no one. The other four are using the "time -4:00" syntax, which is at least defensible.

BTW, I'm just using Drew's own script to extract this stuff - I'm not doing anything strange to the output:

public static void saveMetadataToFile(File image, File output)

    throws ImageProcessingException, IOException
{
    Metadata metadata = ImageMetadataReader.readMetadata(image);
    PrintStream printStream = new PrintStream(output);
    try {
        for (Directory directory : metadata.getDirectories()) {
            for (Tag tag : directory.getTags()) {
                printStream.println(tag);
            }
        }
    } finally {
        printStream.close();
    }
}

jefftucker1952 avatar Jan 21 '20 22:01 jefftucker1952

This is where you apply the time-zone:

printStream.println(tag)

By not explicitly formatting it, and only using the implicit .toString(), you "fall back" to the standard Java handling, which is to transform timestamps to the OS defined time-zone.

Nadahar avatar Jan 21 '20 22:01 Nadahar

So why don't I see the same result with all 6 Date objects?

jefftucker1952 avatar Jan 21 '20 23:01 jefftucker1952

My guess is that there are some slight differences in the toString() implementation for these tags. I agree that it's strange that it's inconsistent, but I'm sure there's a reason if one digs into it. Still, if you didn't rely on toString(), you could format it however you wanted.

Nadahar avatar Jan 21 '20 23:01 Nadahar

The MovieHeader tags have a bit more manipulation to them seen here: https://github.com/drewnoakes/metadata-extractor/blob/270db1ed32494d758109b85be961dfc97aae866d/Source/com/drew/metadata/mp4/boxes/MovieHeaderBox.java#L80-L81

#416 For some context

payton avatar Jan 21 '20 23:01 payton

@payton Indeed, it seems like this 1904 Epoch doesn't apply to for example the track header: https://github.com/drewnoakes/metadata-extractor/blob/270db1ed32494d758109b85be961dfc97aae866d/Source/com/drew/metadata/mp4/boxes/TrackHeaderBox.java#L49-L56

Still, as they are stored as long, I don't quite understand how that should impact the formatting when converted to text.

Nadahar avatar Jan 21 '20 23:01 Nadahar

@Nadahar Ah, yes, that too. There is also the media header box that may need to be changed https://github.com/drewnoakes/metadata-extractor/blob/270db1ed32494d758109b85be961dfc97aae866d/Source/com/drew/metadata/mp4/boxes/MediaHeaderBox.java#L38-L46

payton avatar Jan 21 '20 23:01 payton

Actually, the track header seems like a bug to me. According to ISO/IEC 14496-12:2015 8.3.2.3 (page 25):

  • creation_time is an integer that declares the creation time of this track (in seconds since midnight, Jan. 1, 1904, in UTC time).
  • modification_time is an integer that declares the most recent time the track was modified (in seconds since midnight, Jan. 1, 1904, in UTC time).

This means that the same Epoc conversion should be applied there as well, AFAICU.

Nadahar avatar Jan 21 '20 23:01 Nadahar

@payton Maybe a search for creationTime and modificationTime in /mp4/boxes/ is a good idea?

Edit: The Epoc is the same for the MediaHeaderBox (8.4.2.3), and the same is probably true for all "standard" boxes where dates appear.

Nadahar avatar Jan 21 '20 23:01 Nadahar

@Nadahar I was just checking the documentation, too. Yeah, it seems like all dates should be formatted that way.

payton avatar Jan 21 '20 23:01 payton

@payton I'm not familiar enough with the logic for these directories, I see in Mp4BoxHandler that they are handled differently.

For the MovieHeader, the Epoc is converted in addMetadata(), but the MediaHeader doesn't even have such a method. TrackHeader has the method, but doesn't set the timestamps as far as I can see. So, at the moment, I don't know where the "Video" and "Sound" are stored - or where they are set.

Nadahar avatar Jan 21 '20 23:01 Nadahar

@Nadahar I'm confusing myself, but it may be getting set properly here https://github.com/drewnoakes/metadata-extractor/blob/ef906fb58057cb3355577c2e7c2d86aa5a4472b6/Source/com/drew/metadata/mp4/Mp4MediaHandler.java#L40-L51 This part gets a little more confusing because media boxes such as sound or video are all within the same "type" of box, so I introduced a context object to account for the current type of box being processed. I believe this should set all media type dates as we expect, so I'm not sure why we are seeing the difference that @EarlyOut pointed out.

payton avatar Jan 22 '20 00:01 payton

@payton I have a suspicion that the difference is in the code that generates toString(), if I could only figure out where that code is..

Nadahar avatar Jan 22 '20 00:01 Nadahar

@Nadahar Well, there is what's in the TagDescriptor class https://github.com/drewnoakes/metadata-extractor/blob/81143f746e6774eb7625e374b3f6e56396288d1a/Source/com/drew/metadata/TagDescriptor.java#L83-L88

However, I don't believe I created any special descriptors for the creationDate/modificationDate since they were just set as dates.

payton avatar Jan 22 '20 00:01 payton

@payton That formatting looks suspect to me, what exactly does the replace do? This also goes back to the long-standing problem that Locale isn't handled, so that Locale.getDefault() is applied by the Java, which produces different strings for different "regional OS settings". I still don't understand exactly how this strange outcome is produced, but I guess setting the same locale and doing some tracing would reveal it.

Nadahar avatar Jan 22 '20 00:01 Nadahar