Support for Multiple Audio Streams (in single segment)
@robwalch - I'm not 100% sure if it was about what I thought it would be about, but a few weeks ago, there was a custom item in the "Release Planning and Backlog" project, which didn't link to any issue.
What happened to that item and was it about the same (issue title) or something different?
I had read some earlier discussions about this, where references were made to the Apple HLS authoring spec, which demands no more than a single audio track in segments, but the actual spec say something very different:
Clients SHOULD be prepared to handle multiple tracks of a particular type (e.g., audio or video). A client with no other preference SHOULD choose the track with the lowest numerical track identifier that it can play.
From my interpretation, this means that clients are even encouraged to be capable of supporting multiple (audio-)streams, and that the minimum required behavior is to choose the track with the lowest PID.
Granted: the spec doesn't impose any requirements like for switching renditions or variants, but what I'm thinking about is something very simple for a start. There are two parts:
Part 1: Provide Audio Stream Information
- When decoding the PMT, collect all available (and supported) audio streams
- Parse a few descriptors to get information about language, disposition and stream names
- Provide an API that allows to read that information, including PID for each audio stream (allowing implementers to display these for user selection)
Part 2: Allow specifying a preferred audio stream by PID on load
- The heading says it: it will require the player to be re-loaded
- Might not be as elegant as switching audio while playing - but it simplifies this a lot:
- There have been comments about MSE implementations in browsers which often do not support more than a single audio track => that won't be required at all, there's always just a single audio track
- Re-Creating the player for switching to another audio track won't create a really bad experience, because the segments don't change and the segments around the playback position are still cached by the browser
The benefit of that idea would be that it could be implemented quickly and would be non-invasive, as the changes are minimal.
Please let me know what you think about it!
Thanks, softworkz
Hi @zsoftworkz,
Do other HLS players support multiple audio tracks in MPEG-2 TS segments (PES/PMT elemental streams)?
The spec states that, given no other preference, to "handle" multiple tracks by choosing the first. We do that and I believe that is in line with Safari's behavior.
For compatibility with most client applications, alternate audio should be packaged in media playlist tracks.
Hi @zsoftworkz,
Do other HLS players support multiple audio tracks in MPEG-2 TS segments (PES/PMT elemental streams)?
Yes: MPV Player, ExoPlayer, VLC. Probably others.
The spec states that, given no other preference, to "handle" multiple tracks by choosing the first. We do that and I believe that is in line with Safari's behavior.
Yes, it states "given no other preference", and the goal would be to provide a means to allow giving that preference.
For compatibility with most client applications, alternate audio should be packaged in media playlist tracks.
I agree to that, when you are packaging for a larger audience. But in a situation, where you are creating live streams on-demand for single users only, that's not very practical when you would run multiple transcodes in parallel or even impossible when you are just segmenting a live TV stream without any other modification.
I understand that this is not the primary direction for multiple audio streams, that's why I'm proposing just that pretty minimal way to be able to provide a "preference" for the audio stream selection. Just "Part 2" would be even be sufficient for our case (we "know" the streams and can tell the client), I'm proposing "Part 1" only to make it useful for others as well.
BTW: I'd just need some structural guidance, i.e. where to put the code and how to expose it API-wise, then I would fill the details (e.g. audio streams and descriptor parsing).
How about Safari? I don't think we should support this if not supported by Safari.
Parsing is simple enough in tsdemuxer. The challenge is that HLS.js only has a single audio SourceBuffer.
Currently, with media tracks, switching audio involves loading a different track playlist, the audio buffer is cleared, and the new track segments are loaded instead. This is handled by audio-track-controller and audio-stream-controller.
With multiple streams in the main variant, ts segments would need to be reloaded when a different audio stream is selected. That logic would need to be handled in stream-controller. This makes it very challenging to have a single source of truth for audio tracks, their selection, and management of the fragment-tracker and buffer.
Hi Rob,
I understand the difficulties involved in switching streams while running. That's why I'm not proposing to do that at all. Sorry for being unclear on the proposal...
Part 2: Allow specifying a preferred audio stream by PID on load
What I meant instead is as simple as it could be: allow an implementer to specify an audio stream preference on playlist load, meaning as part of a playback request when supplying the playlist URL to hls.js.
The preference can be in form of a PID in the most simple case or maybe a preferred language or disposition (e.g. 'for the hearing impaired'). I think those are the preferences that are meant by the spec; at least those are the typical classifications of audio streams.
The procedure for the implementer to change the audio stream preference while playback is running would be:
- store playback position
- stop playback
- destroy hls.js
- create a new instance of hls.js
- reload the playlist, while indicating the audio stream preference
- seek to the stored playback position
That's what I meant and why I said it would be non-invasive to the inner workings and not a big thing to implement. Given that this is such a tiny addition, I wouldn't say that it matters whether Safari could do it or not. Or what would you think?
@robwalch - Hi Rob, what are your thoughts about my ~~latest~~ most recent comment, detailing the basic idea how this could be done?
I would expect this to work similarly to how the native <video> handles media with multiple audio streams, they are available using the video.audioTracks property. Each track include properties such as label/language. The enabled property also allows the track to be toggled on/off. I dont even think the HLS.js API would need to be changed.
I would expect this to work similarly to how the native
<video>handles media with multiple audio streams, they are available using thevideo.audioTracksproperty...
Hi @deadbeef84,
This issue is specifically about audio streams in MPEG-2 TS segments. It is not dependent on supporting multiple HTMLMediaElement audioTracks.
If you would like HLS.js to map HLS audio MEDIA playlist options to HTMLMediaElement audioTracks, please file a feature request.
Hi Rob, what are your thoughts about my latest most recent comment, detailing the basic idea how this could be done?
Hi @softworkz,
WRT https://github.com/video-dev/hls.js/issues/3931#issuecomment-848058232, now that we have an audioPreference config option, we could consider accepting a change that extended those options to pick an alternate audio PID. I think it would have to be an explicit setting for MPEG-TS audio so that it does not change the default behavior which aligns with other HLS clients (using the first).