opus
opus copied to clipboard
[feature request] Parallelized decoding example + option for encoder to add regular restart points
I found a paper on GPU-parallelized decoding of FLAC http://users.cecs.anu.edu.au/~Eric.McCreath/papers/YeMcCreathISPA2018.pdf https://github.com/Harinlen/GPUraku
Can opus bitstream be decoded in parallelized fashion (offline / batch decoding where the full bitstream is available for scan)? (probably after range decoding step). The usecase is to speed up offline decoding of large multi-hour files as a prior step by speech recognition pipeline.
If I understand well, for that one needs to ensure regular "sync-points" to ensure regular codec state reset (e.g. as if once every minute the codec switched between CELT to CELT - so no actual mode switching, but just resetting the codec state at regular moments in time)? Is it so?
Can this be somehow hacked around with existing libopus? Do you know of any attempts? It would also be interesting to know if Chromium can be asked to generate any of these "sync points" as their stream encoding pipeline.
I also found work on parallelized opus encoding: https://www.freac.org/developer-blog-mainmenu-9/14-freac/257-introducing-superfast-conversions https://github.com/enzo1982/superfast https://github.com/enzo1982/freac by @enzo1982. Would using this parallelized encoding scheme enable parallelized decoding as well?
In the IETF opus document the number of mentions of the word "parallel" is 0, no discussion at all :)
Thank you!
It's somewhat so. You're correct that parallel decoding requires restart points, and Opus doesn't have independently-coded blocks the way FLAC does. Each encoded packet depends on state from previous packets, except where an encoder explicitly inserts such a restart point.
However, the state dependence it designed to decay across of a few packets so it's possible for decoders to join in-progress streams, for example in conferencing applications. The file format spec recommends discarding 80ms of decoded audio after a seek to allow for decoder convergence.
So if your streams are long enough that inserting 80ms overlaps between parallel decode segments won't hurt performance, you should be able to split the stream and re-combine the decoded segments. The result won't be bit-identical, but it will be perceptually indistinguishable, which is all a lossy codec provides in any case.
Got it. Just in case, I'll upgrade this issue to be a feature request for a few ideas:
-
to add example programs of parallelized decoding with 2 modes: (1) without scanning for restart points, (2) with scanning for restart points
- maybe also an example of parallelized encoding (even if not production-quality) experimented by @enzo1982 in https://github.com/enzo1982/superfast https://github.com/enzo1982/freac - introduces some overlap frames at the block boundaries (some discussion at https://github.com/enzo1982/freac/issues/505?notification_referrer_id=NT_kwDOAA_lWLI2ODc5NTI4NzYyOjEwNDE3NTI#issuecomment-1609695622)
-
to add an option to libopus encoder supporting emitting restart points (I guess TBD if they should be emitted every N number of frames or if it should also take into account adaptive frame durations and be depending on actual audio time. Could the encoding mode changes be abused to have sth like restart points? The decoding mode itself wouldn't change, but the codec state would be reset (e.g. CELT->CELT "fake" switch). Maybe the already supported PLC machinery can be helpful for this? In a way regular restart points is related to missing context recovery
(in case I hack something around, I'll post it here but probably not very soon)
If you are operating in CELT-only mode, there is also a slightly hidden encoder control: CELT_SET_PREDICTION (check celt.h). If you set this to 0, frames should be encoded independently.