ttconv VTT Writer performances

Hi,

Source is an SCC with 1033 captions.

Converting it to TTML is ok
Converting it to VTT is very slow (10s on MacBook pro)
Converting it to TTML then to VTT is very slow (10s on MacBook pro)

source.scc.zip

Top 3 Time-Consuming Operations:

_process_element (isd.py:413)

Consumed 35.644s cumulative time Called 2,165,274 times Recursive function (note the ncalls format: 2165274/48445)

_compute_styles (isd.py:400)

Consumed 11.755s cumulative time Called 150,222 times

set_style (model.py:339)

Consumed 5.424s cumulative time Called 10,642,028 times

Please let me know if you need more details.

Jan 28 '25 07:01 nywhere

The current algorithm is not optimized when the input document both generates a large number of regions with indefinite temporal intervals and a large number of captions/subtitles: all regions must be visited for each captions/subtitle (NxM problem).

Couple of options come to mind:

reduce the number of regions generated when reading an SCC document by coalescing regions with similar dimensions (probably a good idea in any event)
optimize the ISD generation algorithm (probably following a pattern similar to that at https://github.com/sandflow/imscJS/commit/b728b682f28fac4431fffe0588d9d0b9574a3f3d)
add multi-processing support (not sure it is entirely worth the effort)

Jan 28 '25 16:01 palemieux

The first option definitely makes sense. Option 2 would be nice too.

Maybe some caching on computed style could help too, if applicable of course.

Jan 28 '25 17:01 nywhere