getID3 icon indicating copy to clipboard operation
getID3 copied to clipboard

Incorrect parsing of Quicktime files

Open leenooks opened this issue 1 year ago • 1 comments

Howdy, was just trying getID3 again for my photo/video parsing, and discovered an error parsing the mp3 files (from an iphone).

I've tested with two files, and both show data that doesnt match it's label (I'm guessing from the keys and ilst atoms.

Here is an example:

      "GETID3_VERSION" => "1.9.23-202310190849"
...
      "tags" => array:1 [
        "quicktime" => array:6 [
          "camera.lens_model" => array:1 [
            0 => "iPhone SE (3rd generation) back camera 3.99mm f/1.8"
          ]
          "camera.focal_length.35mm_equivalent" => array:1 [
            0 => "\x00\x00\x00\e"
          ]
          "make" => array:1 [
            0 => "4.559366"
          ]
          "model" => array:1 [
            0 => "-12.9871+123.0725+024.569/"
          ]
          "software" => array:1 [
            0 => "Apple"
          ]
          "creationdate" => array:1 [
            0 => "iPhone SE (3rd generation)"
          ]
        ]

To be sure it wasnt any issue with the files, I wrote my own atom processor, and this is the what it should be:

  #items: array:6 [
    "mdta.com.apple.quicktime.location.accuracy.horizontal" => "4.559366"
    "mdta.com.apple.quicktime.location.ISO6709" => "-12.9871+123.0725+024.569/"
    "mdta.com.apple.quicktime.make" => "Apple"
    "mdta.com.apple.quicktime.model" => "iPhone SE (3rd generation)"
    "mdta.com.apple.quicktime.software" => "17.5.1"
    "mdta.com.apple.quicktime.creationdate" => "2024-07-15T17:32:43+1000"
  ]

leenooks avatar Sep 06 '24 12:09 leenooks

If you're able to spot the error(s) in module.audio-video.quicktime.php any patches to the code are very welcome.

If not, please provide a sample file that shows the problem and I'll try to find out why.

JamesHeinrich avatar Sep 06 '24 14:09 JamesHeinrich

Howdy, just wanted to share back with you. I dont have any patches to share, quicktime seems quite complex and I've struggled myself to understand the detail information for each atom type. As I mentioned, I wrote my own quicktime parser as part of understanding this problem and to get the information I need.

What I think I've discovered with your implementation, is that some data elements are pascal strings, but you are calculating them as a fixed sized string to assume it's value. What that has resulted in the incorrect interpretation of some data (as highlighted above). Also, you are manually keying in offsets in data fields, and while I found 1 error (below), there may be others?

The first case that I've noticed this is in the STSD atom, where the codec (or you call the compressor_name) is a pascal string, but you are assuming it is 4 chars. And while it's value if 4 chars (HEVC), its prefixed with \x04, and you should probably be reading that first char, to know how long the string is.

The second related problem, in my video, the compressor_name is 4 chars HEVC, but your parsing of that file is already off by 4 chars when it gets to it, and thus parses it as C\x00\x00\x00 (frame count and data size are also messed up). I think this is because your missed 4 bytes here (line 835):

								$atom_structure['sample_description_table'][$i]['height']           =   getid3_lib::BigEndian2Int(substr($atom_structure['sample_description_table'][$i]['data'], 18,  2));
								$atom_structure['sample_description_table'][$i]['resolution_x']     = getid3_lib::FixedPoint16_16(substr($atom_structure['sample_description_table'][$i]['data'], 24,  4));

(18 plus 2 bytes, takes you to 20, but you start at 24)

Ultimately, movie file parsing logs issues with parsing of other atoms (my parsing doesnt find these atoms), and I suspect it is related (or maybe it is a red-herring?)

     "warning" => array:5 [
        0 => "Unknown QuickTime atom type: "sgpd" (73 67 70 64), 44 bytes at offset 857564"
        1 => "Unknown QuickTime atom type: "sgpd" (73 67 70 64), 26 bytes at offset 857608"
        2 => "Unknown QuickTime atom type: "sbgp" (73 62 67 70), 44 bytes at offset 857634"
        3 => "Unknown QuickTime atom type: "cdep" (63 64 65 70), 12 bytes at offset 859002"
        4 => "Unknown QuickTime atom type: "cdep" (63 64 65 70), 12 bytes at offset 859627"
      ]

I'm happy to share how I parsed the quicktime files - I took a different approach and compared to yours its a long way from being complete to get specific values (but enough for me to pluck out the stuff I was wanting to get - as well as being less reliant on manual offset starting points).

leenooks avatar Sep 18 '24 07:09 leenooks

Thanks, changed in https://github.com/JamesHeinrich/getID3/commit/ded8b13c8df25d3378056f80a6ddd3a6da36830a Should fix the missing 4-byte offset gap, as well as the Pascal string. It looks like I probably originally misinterpreted "32 byte Pascal string" for "32-bit string". But who knows what documentation I was working from 20 years ago.

Please reopen, preferably with a sample file, if any issues persist.

JamesHeinrich avatar Sep 20 '24 19:09 JamesHeinrich

The sample has been emailed to info[at]getid3[dot]org, please take a look. Thank you!

gino0631 avatar Oct 15 '25 00:10 gino0631

Thanks for the sample file. The quicktime file contains a keys entry to map the field names for data entries. In this sample there are multiple keys data chunks (I wasn't expecting this), and the the later entries should append to the earlier entries not replace them. This is why the data was present but mapped to the wrong fields. Should be fixed in https://github.com/JamesHeinrich/getID3/commit/164c78b75696266652c7ff6f78036ff5a9d7ad7f

JamesHeinrich avatar Oct 15 '25 13:10 JamesHeinrich

Thanks for the fix, the problem seems to be solved. Just two small things:

  1. Are you sure video codec value is correct for that sample?
  2. I have left this comment regarding the locale.

gino0631 avatar Oct 15 '25 15:10 gino0631

  1. Apparently not, it's grabbing garbage data from somewhere, I'll need to investigate.
  2. Thanks, language issue fixed in https://github.com/JamesHeinrich/getID3/commit/638be019bb6c379d69001231278ab418427f5442

JamesHeinrich avatar Oct 16 '25 14:10 JamesHeinrich

Badly handled metadata should be better in https://github.com/JamesHeinrich/getID3/commit/3c7e76b0c2c0243fb6c84bbd222f21ce691c22a3

JamesHeinrich avatar Oct 17 '25 16:10 JamesHeinrich

Looks good, thanks!

gino0631 avatar Oct 18 '25 20:10 gino0631