VTT Writer performances
Hi,
Source is an SCC with 1033 captions.
- Converting it to TTML is ok
- Converting it to VTT is very slow (10s on MacBook pro)
- Converting it to TTML then to VTT is very slow (10s on MacBook pro)
Top 3 Time-Consuming Operations:
_process_element (isd.py:413)
Consumed 35.644s cumulative time Called 2,165,274 times Recursive function (note the ncalls format: 2165274/48445)
_compute_styles (isd.py:400)
Consumed 11.755s cumulative time Called 150,222 times
set_style (model.py:339)
Consumed 5.424s cumulative time Called 10,642,028 times
Please let me know if you need more details.
The current algorithm is not optimized when the input document both generates a large number of regions with indefinite temporal intervals and a large number of captions/subtitles: all regions must be visited for each captions/subtitle (NxM problem).
Couple of options come to mind:
- reduce the number of regions generated when reading an SCC document by coalescing regions with similar dimensions (probably a good idea in any event)
- optimize the ISD generation algorithm (probably following a pattern similar to that at https://github.com/sandflow/imscJS/commit/b728b682f28fac4431fffe0588d9d0b9574a3f3d)
- add multi-processing support (not sure it is entirely worth the effort)
The first option definitely makes sense. Option 2 would be nice too.
Maybe some caching on computed style could help too, if applicable of course.