node-id3 icon indicating copy to clipboard operation
node-id3 copied to clipboard

Using CTOC Entry Count byte causes issue with large number of entries

Open harrisi opened this issue 10 months ago • 8 comments

When reading the CTOC frame, the entry count is used to get the child elements here. Since entry count is a single unsigned eight bit integer, it overflows when the number of chapters is more than 255.

This is an issue with the spec, really, since there's plenty of room in the frame to list more than 255 chapters. Different platforms handle this differently, but FFmpeg and mpv are two examples where they seem to ignore the entry count byte and just read the actual list of chapters.

harrisi avatar Oct 12 '23 16:10 harrisi

Interesting problem, some initial thoughts I'm having:

  • The length should definitely be maxed so the number cannot overflow
  • Should we still write more than 255 entries?
  • Can we really ignore the number?
    • If there is an unknown number of null-terminated text entries, how do you know which one is the last?
    • After the chapters, there can still be sub-frames. How do we detect that?
CTOC
...
255
FIRST CHAPTER\0		// chapter 1
SECOND CHAPTER\0	// chapter 2
THIRD CHAPTER\0		// chapter 3
TIT2\0\0\020HELLO\0	// sub-frame (title, partly written down)

This could also be read as

CTOC
...
255
FIRST CHAPTER\0		// chapter 1
SECOND CHAPTER\0	// chapter 2
THIRD CHAPTER\0		// chapter 3
TIT2\0			// chapter 4
\0			// chapter 5 (would be wrong ofc as two empty strings are not allowed as CHAP ID, but they could also be different
\0			// chapter 6
20HELLO\0		// chapter 7

I'm not sure if there is an actual clean way to ignore the length

Zazama avatar Oct 12 '23 16:10 Zazama

Yeah, it's awkward. jsmediatags also gets this wrong. mpv gets it "right", but I can't find where they're handling it right now. I expect there's an issue with the TIT2 title like you mentioned.

I think instead of entry count you can get the size of the frame and read that many bytes (minus header size and whatever else).

harrisi avatar Oct 12 '23 16:10 harrisi

I think instead of entry count you can get the size of the frame and read that many bytes (minus header size and whatever else).

The frames are included in the CTOC size, I don't think it is possible like that. Would be interesting to see how other implementations handle that

Zazama avatar Oct 12 '23 16:10 Zazama

I have a file with 255 chapters and one with 257, and the CTOCs for them are:

CTOC
$00 00 06 91 // size
$00 00 // flags
toc // element id
$00 03 // flags
$FF // entry count
chp0 $00
chp1 $00
// ...
chp254 $00
CHAP
// ...

and

CTOC
$00 00 06 9F // size
$00 00 // flags
toc // element id
$00 03 // flags
$01 // entry count
chp0 $00
chp1 $00
// ...
chp256 $00
CHAP
// ...

As you can see, the size is 14 bytes more, since chp255$00chp256$00 is 14 bytes, so it seems to be an option.

harrisi avatar Oct 12 '23 16:10 harrisi

Yes, but there can be other tags inside of the CTOC frame, not after it. In my example above, the size of the full TIT2 frame would be included in the CTOC frame's size, leaving the problem open about detecting if there are any frames inside the CTOC frame or not

See here: https://mutagen-specs.readthedocs.io/en/latest/_images/CTOCFrame-1.0.png

Zazama avatar Oct 12 '23 16:10 Zazama

Oh right, I missed the subframe part of your first example and was just thinking about chapters with ids as frame ids, which actually I'm not sure if that's allowed. If it's not then you could find the sub frames, right?

Either way, yeah I see what you're saying. I think realistically the problem is actually with the tag writers. FFmpeg uses an unsigned int for the entry count byte as far as I can tell. Really I think the spec implies that only 255 chapters are supported, but it doesn't specify that, and it seems popular tag writers don't honor that.

harrisi avatar Oct 12 '23 17:10 harrisi

I think they are allowed, the spec says the IDs must be unique only in respect to the other element IDs. We also can't check if the start of a string is a valid ID3 tag, because we want to support keeping unimplemented tags => tags node-id3 does not know about. I'll have to think about it a little, maybe there is a better solution

Zazama avatar Oct 12 '23 17:10 Zazama

I'll have to think about it a little, maybe there is a better solution

Yeah, me too. I still think you're actually handling this correctly on read by reading entryCount number of entries, it's just that tag writers will happily overflow and write more than that. I'll try to figure out how mpv handles this as well.

harrisi avatar Oct 12 '23 17:10 harrisi