ttconv icon indicating copy to clipboard operation
ttconv copied to clipboard

VTT Writer performances

Open nywhere opened this issue 11 months ago • 2 comments

Hi,

Source is an SCC with 1033 captions.

  • Converting it to TTML is ok
  • Converting it to VTT is very slow (10s on MacBook pro)
  • Converting it to TTML then to VTT is very slow (10s on MacBook pro)

source.scc.zip

Top 3 Time-Consuming Operations:

_process_element (isd.py:413)

Consumed 35.644s cumulative time Called 2,165,274 times Recursive function (note the ncalls format: 2165274/48445)

_compute_styles (isd.py:400)

Consumed 11.755s cumulative time Called 150,222 times

set_style (model.py:339)

Consumed 5.424s cumulative time Called 10,642,028 times

Please let me know if you need more details.

nywhere avatar Jan 28 '25 07:01 nywhere

The current algorithm is not optimized when the input document both generates a large number of regions with indefinite temporal intervals and a large number of captions/subtitles: all regions must be visited for each captions/subtitle (NxM problem).

Couple of options come to mind:

  • reduce the number of regions generated when reading an SCC document by coalescing regions with similar dimensions (probably a good idea in any event)
  • optimize the ISD generation algorithm (probably following a pattern similar to that at https://github.com/sandflow/imscJS/commit/b728b682f28fac4431fffe0588d9d0b9574a3f3d)
  • add multi-processing support (not sure it is entirely worth the effort)

palemieux avatar Jan 28 '25 16:01 palemieux

The first option definitely makes sense. Option 2 would be nice too.

Maybe some caching on computed style could help too, if applicable of course.

nywhere avatar Jan 28 '25 17:01 nywhere