ExoPlayer icon indicating copy to clipboard operation
ExoPlayer copied to clipboard

Support seeking into downloaded fMP4 file without top level sidx box

Open milos-pesic-zattoo opened this issue 6 years ago • 12 comments

[REQUIRED] Use case description

The main use case is seeking support for already downloaded fMP4 files which don't have top level sidx box. According to ISOBMFF spec sidx box isn't required. An example of such a file could be found at: https://drive.google.com/drive/folders/1kFDugORTMhPWq5fZZMPXjrB7E7oZSe5x?usp=sharing ExoPlayer isn't able to seek into these kind of files as they don't have the sidx box - so the seeking map couldn't be built according to the existing logic. The structure of these fMP4 files could be seen with the sample from the link above but in short top level mp4 boxes are organised in this way:

ftyp
pdin
moov
moof
mdat
moof
mdat
... // repeating moof and mdat for each fragment
free
mfra

Background and motivation Use case for these files could be a server which stores media for each fragment individually and needs to provide downloading feature for a media file (e.g movie) consisting of a thousands of these fragments. Providing top level sidx box in such a case is challenging from the backend perspective as the content of all fragments needs to analysed prior packaging the media and starting with serving download request.

The scope of the feature request is seeking support into already downloaded fMP4 files as supporting streaming from network for these cases might be challenging.

Testing done on other players/platforms: AVPlayer on iOS, QuickPlayer on OSX, Windows Media Player on windows and VLC player support this use case already.

Proposed solution

Possible options we thought about for solving the problem are:

  • using mfra box if available - the last box in the given sample and its tfra child boxes which have a list of entries for all tracks which contain time and moof offsets - maybe this info could be used to build seeking map.
  • another possible solution is inserting sidx box for each fragment on server side - which could contain earliest presentation time for that fragment. The logic for building seeking table could than rely on these these fragment related sidx boxes. This is just an idea - we didn't fully explored yet.

Alternatives considered

NA

milos-pesic-zattoo avatar Nov 27 '19 17:11 milos-pesic-zattoo

Use case for these files could be a server which stores media for each fragment individually and needs to provide downloading feature for a media file (e.g movie) consisting of a thousands of these fragments. Providing top level sidx box in such a case is challenging from the backend perspective as the content of all fragments needs to analysed prior packaging the media and starting with serving download request.

Can you provide a bit more information about the use case. In particular, what's the real life scenario where it would make sense to end up in the state of having thousands of individual fragments corresponding to a movie, without also having any corresponding indexing information, on the server side?

The only case I can really think of is that you've previously live-streamed the content using HLS (but not DASH, because the segments appear to be muxed). However, it's unclear why you wouldn't have any indexing information in that case, from which you could easily generate either a sidx box or a HLS media playlist, both of which would presumably solve this problem.

ojw28 avatar Dec 02 '19 13:12 ojw28

Sure - your guess is good - it's about reusing the data which is already live streamed. To be streaming protocol agnostic the encoded media data might be stored in a common container which is not necessarily mp4. So each x seconds of media data is encoded and muxed to this container - in further text media fragment. Each of these fragments in case of live streaming will become individual e.g hls segment - a streaming backend needs to take original media fragment remux it into appropriate container (depending on streaming protocol clients requested) and serve it as a dash or hls segment. In case of downloading the content - the backend could reuse media fragments - remux it to mp4 and serve it to users. Building top level sidx box in a case where original data isn't packaged in mp4 could be challenging - if possible at all - since a lot of things could change during broadcasting a stream, from one media fragment to the other (e.g number of audio tracks and audio codecs used) - so a seeking/indexing map for all live streams available on a platform would need to contain a lot of metadata in addition to their size and timing info - so that sidx box could be correctly assembled upfront. It might be possible - but could be complex and still time consuming (the index table and metadata fetching and calculation needs to happen on each request for downloading). I hope this provides a bit more context.

milos-pesic-zattoo avatar Dec 03 '19 13:12 milos-pesic-zattoo

since a lot of things could change during broadcasting a stream, from one media fragment to the other (e.g number of audio tracks and audio codecs used)

How do you handle this when constructing the FMP4 file to be downloaded, given you can't change the tracks (either the number of tracks or their properties), as far as I can tell from ISO 14496-12? Are you re-encoding on the fly into a fixed number of tracks with fixed properties?

Have you considered generating a DASH manifest for the download stream, which would presumably avoid the issue you're currently running into (and is presumably much more in-line with your flow for the streaming case)?

ojw28 avatar Dec 05 '19 19:12 ojw28

Hey @milos-pesic-zattoo. We need more information to resolve this issue but there hasn't been an update in 14 days. I'm marking the issue as stale and if there are no new updates in the next 7 days I will close it automatically.

If you have more information that will help us get to the bottom of this, just add a comment!

google-oss-bot avatar Dec 20 '19 02:12 google-oss-bot

Unfortunately, our user case does not allow downloading a dash manifest. We need to send fragmented mp4 files as chunks to clients in a “live” manner and only know the fragment mapping or info required for seeking after sending all requested fragmented mp4 files

milos-pesic-zattoo avatar Dec 23 '19 16:12 milos-pesic-zattoo

We will leave this open to track the feature request, however it's unlikely to be prioritized in the near term. The problem you're facing seems to be a consequence of some architecture choices on the serving side that, to the best of my knowledge, no one else has made, which means it ends up a fairly long way down the list when ranked in terms of cost/benefit.

ojw28 avatar Jan 02 '20 18:01 ojw28

I did some investigation on building the seekMap using the data from mfra. We can extract durations and moof box offsets from there but the seekmap also requires the size of the moof atoms which is not encoded in the tfra atoms (unless I missed those). Creating a valid seekmap would thus involve getting the size for every moof atom referenced in the tfra. This requires a fully seekable input concept which does (currently) not exist.

Correct me if I'm wrong @ojw28

NicolaVerbeeck avatar Feb 16 '22 15:02 NicolaVerbeeck

But the seekmap also requires the size of the moof atoms

I don't think this is required. Perhaps you've concluded this having looked at the ChunkIndex class? That's an implementation of SeekMap, but I don't think there's anything forcing you to use it. Note also that it doesn't actually use the chunk sizes to implement the SeekMap interface, so it should be straightforward to create a similar implementation that omits them entirely.

ojw28 avatar Feb 17 '22 23:02 ojw28

I see, thanks for the clarification!

NicolaVerbeeck avatar Feb 18 '22 07:02 NicolaVerbeeck

@NicolaVerbeeck Have you found a solution to this problem?

ajax-kokosh-r avatar Oct 21 '24 14:10 ajax-kokosh-r

I seem to recall I did, though that's 3 years ago and I can't remember (and if it was for a client I will have removed the code from my pc)

NicolaVerbeeck avatar Oct 21 '24 15:10 NicolaVerbeeck

I've managed to implement it using mfra/tfra boxes. But there is only keyframes, so i've decided to rely on trun boxes and this also works.

ajax-kokosh-r avatar Nov 13 '24 13:11 ajax-kokosh-r