file-type icon indicating copy to clipboard operation
file-type copied to clipboard

Support for WebVTT files (text/vtt)

Open AleksandrHovhannisyan opened this issue 1 year ago • 2 comments

First, thanks so much for this package! We've been using it at work to validate files uploaded by users and it works as expected for the majority of our use cases. There is one edge case where it doesn't currently validate WebVTT files (MIME type text/vtt, for captions shown in a video element's <track>).

The magic numbers for VTT files are as follows according to the W3 document titled WebVTT: The Web Video Text Tracks Format:

WebVTT files all begin with one of the following byte sequences (where "EOF" means the end of the file):

EF BB BF 57 45 42 56 54 54 0A EF BB BF 57 45 42 56 54 54 0D EF BB BF 57 45 42 56 54 54 20 EF BB BF 57 45 42 56 54 54 09 EF BB BF 57 45 42 56 54 54 EOF 57 45 42 56 54 54 0A 57 45 42 56 54 54 0D 57 45 42 56 54 54 20 57 45 42 56 54 54 09 57 45 42 56 54 54 EOF (An optional UTF-8 BOM, the ASCII string "WEBVTT", and finally a space, tab, line break, or the end of the file.)

Would it be possible to support this? If so, I'd be happy to help or put in a PR.

AleksandrHovhannisyan avatar Aug 14 '24 19:08 AleksandrHovhannisyan

That is in my opinion in scope.

Please note that we got the BOM covered in a generic way:

https://github.com/sindresorhus/file-type/blob/988bf4bc9f9bc98e8f3360da4dfa36e5caa455b3/core.js#L251-L255

So ignore the magic numbers with the BOM field (EF BB BF), those will be automatically covered.

I suggest to trigger on WEBVTT, and possibly match the last character.

Borewit avatar Aug 14 '24 19:08 Borewit

Thanks! That makes sense. I'll work on this and put up a PR.

AleksandrHovhannisyan avatar Aug 14 '24 19:08 AleksandrHovhannisyan