Cloud Recording - Seektable attribute in cloud object metadata
With the Cloud Recording feature, the devices will push video files (or "segments" in this context) on cloud object storage. When a user wish to playback these files, if they are big enough the callup time will be delayed by the need to download the entire file. This is especially true if the user wish to use HLS or MPEG-DASH to play video across multiple files.
By adding a simple seek table in the metadata of the blob, a video player could download a smaller part to start playing faster. It also allows the player to only download the parts that the user wants to play, and thus seeking in the file more efficiently.
This seek table leverage the fact that a file (or segment in this context) can be built of multiple individual and autonomous fragments according to the CMAF/fragmented MP4 standards.
Accessing this information in the metadata is much faster (and less costly) than downloading part of the file to parse the "SIDX" box at the start (or "MFRA" at the end); which are currently optional. Additionally, this feature would be compatible with a potential future full-file encryption option.
I prefer to refer to existing standards as much as possible. Mentioning of full file encryption is an important point. We should analyze how decryption can start at random fragments.
Is this seek table object attribute mandatory or optional ? concerned about the extra work on the device (mainly multi source cameras) side to create this table additionally and dynamically merge fragment offset to maintain a list not exceeding 20 for each track (audio, video and metadata) separately in case of CMAF.
This seek table object would be optional, like all other blob metadata fields. It's presence doesn't add much functionality, but can improve greatly performance of some use cases.
Instead of defining own format suggest to use hex or base64 encoded MP4 track random access box (TRAF). Please note that this information typically can only be written after file upload has completed which may introduce extra cost depending on provider policies.
I updated this PR and applied Hans's recommendations. Please have a look again :) JF and José will be available at the face-to-face to discuss the implications of this change. Note: I used the MFRA (movie fragment random access) box instead of the TFRA to support the format where there are multiple tracks in the file. The only difference is that MFRA can contain one or multiple TFRA boxes.
Please change
Additionally the SeekTable should contain more than twenty entries.
to
Additionally the SeekTable should contain no more than twenty entries.
FFMPEG uses hard coded timescale of 1000 which corresponds to milliseconds. Should we define this as default?
We've seen a good variety of timescales in practice, from 1000 to 90'000, to 30'000'000. It's been pretty spread out so I'm not sure if we want to try to pin a specific scale. I'd be curious if other vendors have more examples? At the end of the day, a single fixed one would be nice for us if doable.
Accidentally closed the PR.