go-astisub icon indicating copy to clipboard operation
go-astisub copied to clipboard

Optional data after WebVTT file signature isn't respected

Open nakkamarra opened this issue 6 months ago • 2 comments

When I read a webvtt file using the ReadFromWebVTT function, do some work, and attempt to write the captions back out to WriteToWebVTT, the optional data after the WebVTT file signature is dropped.

Example:

    captions, err := astisub.ReadFromWebVTT(reader) // reader here represents file
    if err != nil {
        panic(err)
    }
    // ... do some work here to captions
    captions.WriteToWebVTT(w) // writer represents output

file.vtt:

WEBVTT - Some optional comment here

1
00:00:00.500 --> 00:00:02.000
The Web is always changing

2
00:00:02.500 --> 00:00:04.300
and the way we access it is changing

output:

WEBVTT

1
00:00:00.500 --> 00:00:02.000
The Web is always changing

2
00:00:02.500 --> 00:00:04.300
and the way we access it is changing

This isn't a huge deal, it doesn't seem to cause issues with parsing. But I would expect it to work, as it's technically valid according to the spec:

A WebVTT file body consists of the following components, in the following order:

  1. An optional U+FEFF BYTE ORDER MARK (BOM) character.
  2. The string "WEBVTT". 3. Optionally, either a U+0020 SPACE character or a U+0009 CHARACTER TABULATION (tab) character followed by any number of characters that are not U+000A LINE FEED (LF) or U+000D CARRIAGE RETURN (CR) characters.
  3. Exactly one WebVTT line terminators to terminate the line with the file magic and separate it from the rest of the body.
  4. Zero or more WebVTT metadata headers.
  5. One or more WebVTT line terminators to terminate the header block and separate the cues from the file header.
  6. Zero or more WebVTT cues and WebVTT comments separated from each other by one or more WebVTT line terminators.
  7. Zero or more WebVTT line terminators.

nakkamarra avatar Aug 12 '24 23:08 nakkamarra