mutagen
mutagen copied to clipboard
MP4 Chapter support
moov.udta.chpl
https://www.adobe.com/content/dam/Adobe/en/devnet/flv/pdfs/video_file_format_spec_v10.pdf
Maybe expose as MP4Tags.chapters
Turns out there are 4 different standards for chapters, so I'm not even sure where to begin..
Any news? Iazka talked about 4 standards could you provide information. And has anybody started working on it?
I haven't looked into it since then.
I had some m4b files lying around and some time to dabble. The code below seems to be able to parse the chpl box and chapterrecords according to whats described in https://www.adobe.com/content/dam/acom/en/devnet/flv/video_file_format_spec_v10.pdf, at least on the files I have lying around.
- Do you have some example of how chapter information could be exposed?
- ~~I have no idea what the master timescale is and therefor no idea how to interpret the chapter timestamps.~~
Code
diff --git a/mutagen/mp4/__init__.py b/mutagen/mp4/__init__.py
index d51e3be..2a1f950 100644
--- a/mutagen/mp4/__init__.py
+++ b/mutagen/mp4/__init__.py
@@ -25,6 +25,7 @@ were all consulted.
import struct
import sys
+from datetime import timedelta, datetime
from mutagen import FileType, Tags, StreamInfo, PaddingInfo
from mutagen._constants import GENRES
@@ -889,6 +890,103 @@ class MP4Tags(DictProxy, Tags):
return u"\n".join(values)
+class MP4Chapters(object):
+ """MP4Chapters()
+
+
+ """
+
+ def __init__(self, *args, **kwargs):
+ self._timescale = None
+ self._duration = None
+ super(MP4Chapters, self).__init__()
+ if args or kwargs:
+ self.load(*args, **kwargs)
+
+ def load(self, atoms, fileobj):
+ try:
+ mvhd = atoms.path(b"moov", b"mvhd")[-1]
+ except KeyError as key:
+ return MP4MetadataError(key)
+
+ ok, data = mvhd.read(fileobj)
+ self._parse_mvhd(mvhd, data)
+
+ if not self._timescale:
+ raise MP4MetadataError("Unable to get timescale")
+
+ try:
+ chpl = atoms.path(b"moov", b"udta", b"chpl")[-1]
+ except KeyError as key:
+ return MP4MetadataError(key)
+
+ ok, data = chpl.read(fileobj)
+ self._parse_chpl(chpl, data)
+
+ @classmethod
+ def _can_load(cls, atoms):
+ return b"moov.udta.chpl" in atoms
+
+ def _parse_mvhd(self, atom, data):
+ version = data[0]
+ flags = struct.unpack(">I", b'\x00' + data[1:4])[0]
+ print(f'mvhd: version: {version}, flags: {flags}')
+
+ basedate = datetime(year=1904, month=1, day=1)
+ pos = 4
+ if version == 0:
+ created = struct.unpack(">l", data[pos:pos + 4])[0]
+ pos += 4
+
+ modified = struct.unpack(">l", data[pos:pos + 4])[0]
+ pos += 4
+
+ self._timescale = struct.unpack(">l", data[pos:pos + 4])[0]
+ pos += 4
+
+ self._duration = struct.unpack(">l", data[pos:pos + 4])[0]
+ pos += 4
+ elif version == 1:
+ created = struct.unpack(">q", data[pos:pos + 8])[0]
+ pos += 8
+
+ modified = struct.unpack(">q", data[pos:pos + 8])[0]
+ pos += 8
+
+ self._timescale = struct.unpack(">l", data[pos:pos + 4])[0]
+ pos += 4
+
+ self._duration = struct.unpack(">q", data[pos:pos +8])[0]
+ pos += 8
+
+ print(f'timescale: {self._timescale}, '
+ f'duration: {self._duration} ({timedelta(seconds=self._duration/self._timescale)}), '
+ f'created: {created} ({basedate + timedelta(seconds=created)}), '
+ f'modified: {modified} ({basedate + timedelta(seconds=modified)})')
+
+ def _parse_chpl(self, atom, data):
+ version = data[4]
+ flags = struct.unpack(">I", b'\x00' + data[5:8])[0]
+ chapters = data[8]
+ print(f'chpl: version: {version}, flags: {flags}, chapters: {chapters}')
+
+ pos = 9
+ for i in range(chapters):
+ start = struct.unpack(">Q", data[pos:pos+8])[0] / 10000
+ pos += 8
+
+ if start/self._timescale > self._duration:
+ print(start/self._timescale, ">", self._duration)
+
+ title_len = data[pos]
+ pos += 1
+
+ title = data[pos:pos+title_len].decode()
+ pos += title_len
+
+ print(i+1, timedelta(seconds=start/self._timescale), title_len, title)
+
+
class MP4Info(StreamInfo):
"""MP4Info()
@@ -1044,6 +1142,7 @@ class MP4(FileType):
"""
MP4Tags = MP4Tags
+ MP4Chapters = MP4Chapters
_mimes = ["audio/mp4", "audio/x-m4a", "audio/mpeg4", "audio/aac"]
@@ -1076,6 +1175,16 @@ class MP4(FileType):
except Exception as err:
reraise(MP4MetadataError, err, sys.exc_info()[2])
+ if not MP4Chapters._can_load(atoms):
+ self.chapters = None
+ else:
+ try:
+ self.chapters = self.MP4Chapters(atoms, fileobj)
+ except error:
+ raise
+ except Exception as err:
+ reraise(MP4MetadataError, err, sys.exc_info()[2])
+
@property
def _padding(self):
if self.tags is None:
Output
mvhd: version: 0, flags: 0
timescale: 1000, duration: 74106811 (20:35:06.811000), created: 0 (1904-01-01 00:00:00), modified: -699636823 (1881-10-29 08:26:17)
chpl: version: 0, flags: 0, chapters: 54
1 0:00:00 3 001
2 0:27:19.096000 3 002
3 0:50:17.200000 3 003
4 1:11:47.069000 3 004
5 1:33:47.263000 3 005
6 1:55:46.156000 3 006
7 2:19:18.301000 3 007
8 2:41:02.287000 3 008
9 3:01:51.660000 3 009
10 3:30:56.036000 3 010
11 3:53:41.137000 3 011
12 4:14:47.925000 3 012
13 4:37:21.973000 3 013
14 5:03:07.772000 3 014
15 5:27:00.072000 3 015
16 5:54:01.382000 3 016
17 6:16:25.724000 3 017
18 6:39:24.107000 3 018
19 7:01:47.521000 3 019
20 7:24:34.108000 3 020
21 7:47:56.640000 3 021
22 8:13:40.999000 3 022
23 8:33:53.638000 3 023
24 8:56:47.005000 3 024
25 9:17:51.193000 3 025
26 9:42:07.966000 3 026
27 9:59:57.059000 3 027
28 10:21:24.466000 3 028
29 10:45:13.515000 3 029
30 11:01:41.942000 3 030
31 11:25:08.932000 3 031
32 11:45:56.262000 3 032
33 12:08:02.446000 3 033
34 12:29:29.343000 3 034
35 12:50:25.914000 3 035
36 13:14:11.990000 3 036
37 13:36:05.218000 3 037
38 13:59:42.146000 3 038
39 14:22:47.542000 3 039
40 14:45:44.392000 3 040
41 15:07:25.731000 3 041
42 15:30:43.572000 3 042
43 15:53:23.147000 3 043
44 16:16:37.552000 3 044
45 16:39:06.678000 3 045
46 17:01:30.277000 3 046
47 17:24:10.781000 3 047
48 17:46:17.151000 3 048
49 18:09:11.540000 3 049
50 18:34:40.992000 3 050
51 19:01:47.643000 3 051
52 19:25:32.234000 3 052
53 19:49:09.626000 3 053
54 20:12:09.170000 3 054
The mvhd created/modified timestamps are off as they don't seem to be encoded as a signed, but an unsigned int, at least for version 0.
find ~/Audiobooks -name "*.m4b" -exec python tools/mutagen-inspect "{}" \; | grep timescale
timescale: 1000, duration: 69087493 (19:11:27.493000), created: 0 (1904-01-01 00:00:00), modified: 3528286209 (2015-10-21 15:30:09)
timescale: 1000, duration: 71417058 (19:50:17.058000), created: 0 (1904-01-01 00:00:00), modified: 3528295994 (2015-10-21 18:13:14)
timescale: 1000, duration: 9290676 (2:34:50.676000), created: 0 (1904-01-01 00:00:00), modified: 3528297241 (2015-10-21 18:34:01)
timescale: 1000, duration: 66388544 (18:26:28.544000), created: 0 (1904-01-01 00:00:00), modified: 3528296642 (2015-10-21 18:24:02)
timescale: 1000, duration: 8976162 (2:29:36.162000), created: 0 (1904-01-01 00:00:00), modified: 3503637776 (2015-01-09 08:42:56)
timescale: 1000, duration: 60292981 (16:44:52.981000), created: 0 (1904-01-01 00:00:00), modified: 3528297200 (2015-10-21 18:33:20)
timescale: 1000, duration: 8773707 (2:26:13.707000), created: 0 (1904-01-01 00:00:00), modified: 3528304246 (2015-10-21 20:30:46)
timescale: 1000, duration: 71924321 (19:58:44.321000), created: 0 (1904-01-01 00:00:00), modified: 3588638494 (2017-09-19 04:01:34)
timescale: 1000, duration: 8948739 (2:29:08.739000), created: 0 (1904-01-01 00:00:00), modified: 3597408606 (2017-12-29 16:10:06)
timescale: 1000, duration: 74106811 (20:35:06.811000), created: 0 (1904-01-01 00:00:00), modified: 3595330473 (2017-12-05 14:54:33)
I'm very interested in this feature. I'm developing an audiobook app for linux using mutagen for metadata reading. What needs to be done to include the code from @mweinelt to mutagen?
What happened with this btw? I hacked some code together a while back (readonly) but there seems to be support added but not released?
I'd much prefer using mutagen's built in code. Also, worth mentioning that many files only use QuickTime style chapters - atm my "solution" is to use MP4v2 package for those.
Shouldn't this issue be closed by #398?
Shouldn't this issue be closed by #398?
that's about nero chapters
Thanks to @jmeosbn for writing the MP4Chapters code. Once I installed the master branch rather than the latest release which predates this patch, it seemed to work fine for reading chapters on all the MP4 files I tried it for. (Though I wasted a lot of time reading the online documentation and trying to make MP4Chapters work before I looked at the actual released code and the master branch and their dates and realized this functionality was not present in the most recent release!)
Which brings me to my point: I too am using the ancient MP4v2 package for a long-standing python project and finally grew sufficiently frustrated with unicode issues that I decided to rip it out and replace it with mutagen. And it seems mutagen can do everything I need in a more convenient and modern fashion.
Except for creating and writing chapters (like mp4v2's mp4chaps can). I suspect I could figure it out and do a pull request, but I bet @jmeosbn could do it much faster and more reliably. Could I beg such support? I'll do plenty of testing!
I didn't write that code (mine is also read only) but I did have the same experience as you in realising it was in a different branch.
I plan to eventually redo my code using that then see about writing chapters. At the moment - rather than use the library - I've just scripted export/import from the mp4chaps cli tool.
Thanks for the update @jmeosbn .
I too use mp4chaps (and the mp4tags) tool from MP4v2 and was in the process of ripping out this dependency and replacing it with mutagen, but found that just the chapter writing functionality was missing. So I guess I'll keep using mp4chaps for now too.
But if there is anybody who already understands the infrastructure were to add writing support, they'd have at least two grateful users!
One tip for that mp4chaps (and mp4tags) workaround: These programs don't handle non-ASCII filenames and tags well, at least when called from Python 3 with the various subprocess interfaces. At least for mp4chaps I finally implemented a workaround that first renames the file to an ASCII name and afterwards renames it back: A temporary hack and one more reason I wish we had chapter writing functionality in mutagen.
Any update on this?