mime-db icon indicating copy to clipboard operation
mime-db copied to clipboard

application/mp4 incorrectly maps to file extension mp4s.

Open Bdthomson opened this issue 5 years ago • 17 comments

Source IANA - https://www.iana.org/assignments/media-types/application/mp4 - "mp4 and mpg4"

https://github.com/jshttp/mime-db/blob/master/db.json#L908 - "mp4s"

Bdthomson avatar Aug 11 '20 21:08 Bdthomson

Thank you for the report. It looks like the current mapping is coming from Apache.

Can you clarify what the outcome you are looking for is? (a) add those two extensions to the list (b) remove the apache one from the list or (c) both?

dougwilson avatar Aug 11 '20 21:08 dougwilson

Ah, that's a bit odd, I would have expected the source to say "apache" as some other mime types in that list do.

I've never heard of mp4s and can't find any record of this file extension even existing, it doesn't show up when you search any of these sites

https://www.fileinfo.com https://www.filext.com https://www.file-extensions.org

As far as outcome, I would expect (c) to see both of those added to the list and mp4s removed.

Bdthomson avatar Aug 11 '20 21:08 Bdthomson

Thanks! So the source is defined in the README as "where the mime type is defined.", and in this case, indeed the MIME type is defined in IANA. We do not include where the extensions are sourced from in the database.

So there is good news and bad news on that:

Good news -- I will look into what is keeping the extensions that are in IANA from showing up in our database Bad news -- We cannot remove mp4s because it is coming from an upstream source; if you feel strongly about removing it, you would need to get it removed from Apache.

dougwilson avatar Aug 12 '20 03:08 dougwilson

@dougwilson

Ref: Apache MIME types

Snippet of that page:

...
# The table below contains both registered and (common) unregistered types.
...
application/mp4					mp4s
...
audio/mp4					m4a mp4a
...
video/mp4					mp4 mp4v mpg4

We cannot remove mp4s because it is coming from an upstream source

Interesting text comment in the spec snippet.

No reference to mp4s, or m4p either in this package at this Apache file, in Ubuntu File associations.

I will look into what is keeping the extensions that are in IANA from showing up in our database

Ref: https://tools.ietf.org/html/rfc4337

"a) if the file contains neither visual nor audio presentations, but only, for example, MPEG-J or MPEG-7, use application/mp4;

b) for all other files, including those that have MPEG-J, etc., in addition to video or audio streams, video/mp4 should be used; however:

c) for files with audio but no visual aspect, including those that have MPEG-J, etc., in addition to audio streams, audio/mp4 may be used.

In any case, these indicate files conforming to the "MP4" specification, ISO/IEC 14496-1:2000, systems file format.

Martii avatar Aug 12 '20 19:08 Martii

That's correct on Apache; of course registered types are in IANA. If we only cared about those we would never need to consume Apache :) We consume the Apache file in order to get the "unregistered" ones they provide, which is a lot of very useful ones not in the IANA database.

dougwilson avatar Aug 12 '20 19:08 dougwilson

Here we go for mp4s:

Ref: https://tools.ietf.org/html/rfc6381

"When the first element of a value is 'mp4a' (indicating some kind of MPEG-4 audio), or 'mp4v' (indicating some kind of MPEG-4 part-2 video), or 'mp4s' (indicating some kind of MPEG-4 Systems streams such as MPEG-4 BInary Format for Scenes (BIFS)), the second element is the hexadecimal representation of the MP4 Registration Authority ObjectTypeIndication (OTI), as specified in [MP4RA] and [MP41] (including amendments). Note that [MP4RA] uses a leading "0x" with these values, which is omitted here and hence implied."

Still no m4p found though other than here... this is puzzling. EDIT: Ref: https://www.loc.gov/preservation/digital/formats/fdd/fdd000052.shtml

"For sound files. The m4p extension is for QuickTime files containing AAC bitstreams purchased from iTunes and protected by a digital rights management scheme. Bookmarkable AAC files may carry the extension m4b. [The mp3 extension is for QuickTime sound files containing MP3 bitstreams; extent of protection unknown at this writing.]"

Martii avatar Aug 12 '20 19:08 Martii

So taking a look at our database, we do actually have a MIME type with mpg4 and mp4 already: video/mp4 (https://www.iana.org/assignments/media-types/video/mp4) which is coming from Apache.

So you want both video/mp4 and application/mp4 to list out those extensions, is that correct?

dougwilson avatar Aug 12 '20 19:08 dougwilson

P.S. if it helps at all, I just noticed the IANA registry points to an RFC that was obsoleted by other RFC. And that RFC notes that all these MPEG-related types and registered in their own alliance registry: http://mp4ra.org/

dougwilson avatar Aug 12 '20 19:08 dougwilson

is that correct?

Depends on the hierarchy for this projects precedence or if it's peer leveled. MPEG anything catches my eye since I've worked on it in the past. Apple hasn't always registered their MIME types but it is in the Library of Congress link... so it's "kind of" there, in history.

Because of rfc4337 conditionals it makes it a little more difficult.

Martii avatar Aug 12 '20 19:08 Martii

Well, as far as this project is concerned, we are not in the business to dig through documents and try and figure out what is "right" and what is not -- that is an entire project in of itself. The goal of this project is to simply aggregate the three sources listed at the top of the README into a nice little JSON file format in order for folks to consume.... So whatever those sources say is associated to what is what this module is going to say. The reason I'm asking those questions is in order to identify which of those three upstreams to correct if it's wrong or needs changing.

dougwilson avatar Aug 12 '20 19:08 dougwilson

Well it's the chicken in the egg syndrome in my book. One needs to get at the MIME type to determine if its a binary audio (conditional a in rfc4337) but this can be a "list" of MIME types usually included to determine the MIME type of a binary file in server side projects and flip it for client side.

Personally I'd probably leave it as is. This is probably one of those "(common) unregistered types" in Apache... as long as mp4 relates to video/mp4 ~~and audio/mp4~~ I think it's okay as is... but the source should indicate Apache and anything else needed (for the other extensions)... which is why I looked it up for ya. :)

simply aggregate

Yes but who has higher priority if any? IANA over Apache, etc. or do you just peer merge it discarding duplicates?

Martii avatar Aug 12 '20 20:08 Martii

It is just all merged together. For example, the MIME types is an object, so it's not possible to have duplicates. The extensions are just all joined together in the extensions array, for example. The goal is that if it exists in at least one of the sources, it exists in this database.

dougwilson avatar Aug 12 '20 20:08 dougwilson

Is it too big of a leap to show multiple sources on merge to help alleviate confusion? CSV separated (or pipes) or even a new field to show the precedence or would that be too breaking?

Martii avatar Aug 12 '20 20:08 Martii

We cannot make that type of change without a major version change and even changing the name of the db file, as this project really took off more than we ever expected, and there are so many folks just pulling the file direct from master all over the Internet... That field is in use by various folks as well.

dougwilson avatar Aug 12 '20 20:08 dougwilson

Okay... well I guess this is one of those paradoxes where one has to check existing issues to see if it's been explained from what source. Apache has one of the application types but the other is the Library of Congress, in history,... sooo... on the merge of IANA and Apache... it picks IANA, probably as the "governing" factor on pulls from those sites. *shrugs*

Martii avatar Aug 12 '20 20:08 Martii

So this module does provide them split out in the src/ directory if you wanted to see the individual split out contents to see what is coming from where. I'm not sure if that answers what you're looking for or not on that front.

dougwilson avatar Aug 12 '20 20:08 dougwilson

Re: @Bdthomson

Ah, that's a bit odd, I would have expected the source to say "apache" as some other mime types in that list do.

...

I'm not sure if that answers what you're looking for...

Just providing a possible explanation for the author of this issue. I'm sure you've said the same thing over and over through the years with this project but my focus is not always on GH searching. ;) Sometimes it helps to have another voice asking the questions to get it in different wording. :) Plus I get a little better understanding of this project and its use cases.

So this module does provide them split out in the src/ directory if you wanted to see the individual split out contents to see what is coming from where.

Good to know. Although programmatically may not be useful since it's usually accessed from all the merged... which is great but that's why I asked in priority in the final list.

Martii avatar Aug 12 '20 20:08 Martii

The application/mp4 mime in the database now includes the two extensions from the IANA entry as the first ones.

dougwilson avatar Feb 06 '23 04:02 dougwilson