shaka-player icon indicating copy to clipboard operation
shaka-player copied to clipboard

Allow opt-in to audio codec changes (on request) via media source reload

Open theRealRobG opened this issue 2 years ago • 10 comments

Have you read the FAQ and checked for duplicate open issues?

Related issue:

  • https://github.com/shaka-project/shaka-player/issues/1528

Is your feature request related to a problem? Please describe.

With our application we have some multi-language streams available that also include tracks in both stereo and surround; however, the surround tracks are ec-3 while the stereo tracks are mp4a.40.2, and furthermore not all languages are available in both surround and stereo.

For example we may have 4 tracks as follows:

  • English Surround
  • English Stereo
  • English Audio Description Stereo
  • Spanish Stereo

In the above example, as things stand right now, if we choose to present surround audio then the user will not have access to Spanish or English Audio Description tracks (which may be legally required), or alternatively we can restrict to stereo for a more full selection but then a degraded experience for users selecting English audio.

Describe the solution you'd like

Issue #1528 specifically deals with the introduction of seamless codec switching via the use of the new changeType method on the SourceBuffer object. While this is desired, sometimes we would like to be able to opt-in to a non-seamless switching approach, as the functionality it would open up (e.g. supporting multi-language with incomplete multi-codec audio) is deemed worth any jarring transtions. Furthermore, there will always be some legacy devices where either changeType is not implemented, or the behaviour is unstable, which may mean that we would prefer to force media reloads over codec changes with these problem devices.

As such we believe that the ability to opt-in to allowing selection of tracks with codec that is not the initial choice should be allowed as configuration. In the scenario that the user has opted into codec switching behaviour they will be presented with all available tracks as per the manifest (even those that do not match the initial codec choice) and a selection of a track from a different codec would engage a new codec switching behaviour. Shaka would have to manage an understanding of a difference between "notified tracks" vs "available streams" for selection such that the codec switching behaviour can be initiated when a non-existent stream is chosen.

The proposal for the opt-in behaviour would be a new enum property on the shaka.extern.PlayerConfiguration as described below:

/**
 * @enum {string}
 */
shaka.extern.CodecSwitchingStrategy = {
  // No codec switching allowed (i.e. current Shaka behaviour).
  DISABLED: 'disabled',
  // Allow codec switching which will always involve reloading the
  // `MediaSource`.
  RELOAD: 'reload',
  // Allow codec switching; determine if `SourceBuffer.changeType` is available
  // and attempt to use this first, but fall back to reloading `MediaSource` if
  // not available / fails.
  //
  // This case can be implemented as a future improvement once #1528 is
  // completed in order to reduce the scope of this change.
  SMOOTH: 'smooth',
};

As noted the SMOOTH case can be left out in the initial implementation so as to reduce the scope (we would be happy with just allowing the RELOAD functionality) and re-approached at a later date as an improvement (once #1528 is resolved).

In order to describe the desired behaviour of the RELOAD strategy I provide below a concrete example to illustrate handling of an audio codec change:

  • Stream is loaded with following audio: English in mp4a.40.2, English in ec-3, Spanish in mp4a.40.2.
  • Desired audio configuration results in English (ec-3) being chosen first.
  • User is presented with choices for both English and Spanish.
  • User selects Spanish audio track.
  • Shaka checks against the CodecSwitchingStrategy and determines that the RELOAD case is active for the session.
  • Shaka destroys the MediaSource object and associated MSE objects
  • Shaka re-creates MediaSource with new SourceBuffer objects representing the new codec selection and updates the src on the video element to reference the new MediaSource object.
  • Shaka maintains the existing MediaKeySession object to be used with the new MediaSource.
  • Shaka updates internal stream representations to track the new codec change.

Some extra notes:

  • There are other use-cases for codec switching outside of user selection such as mis-matched adaptations between two adjacent periods; however this proposal considers the automatic handling of discontinuous streams (e.g adverts with mis-matching codecs compared with the main stream) by initiating codec switching behaviour is out of scope for this request. This request is focused on allowing user selection of all available and playable content in a given manifest. Perhaps this means we should provide a better configuration name (happy for suggestions).
  • Given that this is focused on an issue that is related to audio track selection for us, we may consider that we restrict this to just audio codec changes, but then we also feel that the changes may be generic enough that restricting to just audio may be an unnecessary complication. That being said, the use-case for user selected multi-codec video sounds pretty rare, so maybe it's not even worth considering. Again, happy for suggestions, and whether we should update the configuration name to be more clearly intended for audio codec changes with user selection.

Describe alternatives you've considered

To achieve this we could just dispose the current instance of Shaka player and re-instantiate a completely new one with an updated audio selection configuration; however, this is seen as wasteful, as it would result in unnecessary manifest requests and another licence acquisition journey (which may be relatively expensive) with the new MediaKeySession that will be necessary. It feels like this whole transition would be better managed within Shaka to be more efficient.

Alternatively we could try and force our stream configurations to always have every viable user choice of language/characteristics transcoded to all available codecs; but this puts a burden on our streaming platform for what seems like a not unlikely scenario that could be handled via the player as many other players do.

Additional context

Some precedent for a solution is provided by canal-plus@rx-player.

RX Player example configuration for codec switch behaviour: https://developers.canal-plus.com/rx-player/doc/api/Loading_a_Content.html#oncodecswitch

RX Player example of choosing an adaptation switch strategy (related to codec switching): https://github.com/canalplus/rx-player/blob/master/src/core/stream/period/get_adaptation_switch_strategy.ts

theRealRobG avatar Aug 01 '22 14:08 theRealRobG

Happy to do the work on this, but given this topic has a lot of history and some active development against it (as referenced in the related issue), I believe a conversation should be had about this to see if the idea fits the longer-term ideas of the project / over-arching epic of allowing codec switching.

I think, given the need to support legacy devices will always preclude some from using SourceBuffer.changeType, that this feature is separate enough from the introduction of that feature... Though of course the design goals of that will heavily influence how a "reload" codec switch may be introduced.

theRealRobG avatar Aug 01 '22 14:08 theRealRobG

I'm very interested in this, but need to find time to give it the proper consideration. This issue is in my queue, though, and I'll give it a full read and a response as soon as I can.

joeyparrish avatar Aug 01 '22 16:08 joeyparrish

  • Shaka maintains the existing MediaKeySession object to be used with the new MediaSource.

Unfortunately, I'm not certain this part would work. I could be wrong, so we can experiment and find out, but I think that when you reload MediaSource and change the src on a video element to point to the new MediaSource object, you lose the MediaKeys association. Again, I'm not 100% certain that it works that way, but I expect so.

Some extra notes:

  • There are other use-cases for codec switching outside of user selection such as mis-matched adaptations between two adjacent periods; however this proposal considers the automatic handling of discontinuous streams (e.g adverts with mis-matching codecs compared with the main stream) by initiating codec switching behaviour is out of scope for this request. This request is focused on allowing user selection of all available and playable content in a given manifest. Perhaps this means we should provide a better configuration name (happy for suggestions).
  • Given that this is focused on an issue that is related to audio track selection for us, we may consider that we restrict this to just audio codec changes, but then we also feel that the changes may be generic enough that restricting to just audio may be an unnecessary complication. That being said, the use-case for user selected multi-codec video sounds pretty rare, so maybe it's not even worth considering. Again, happy for suggestions, and whether we should update the configuration name to be more clearly intended for audio codec changes with user selection.

I agree with all of that, except I think that the name you suggested applies to multi-period just fine IMO. Agree that we can leave it out of scope for now.

I think we must be careful to restrict this in a way that reloads are never allowed in ABR decisions, no matter what setting you have enabled. A user should be willing to accept a brief interruption in playback when changing languages, because that's a user-driven action. But that would not be acceptable when the player notices a bandwidth change.

Alternatively we could try and force our stream configurations to always have every viable user choice of language/characteristics transcoded to all available codecs; but this puts a burden on our streaming platform for what seems like a not unlikely scenario that could be handled via the player as many other players do.

I used to believe this was the streaming platform's job, but I think now that I was wrong. The Player should be flexible, and some options should be given to the application to make an appropriate trade-off.

To achieve this we could just dispose the current instance of Shaka player and re-instantiate a completely new one with an updated audio selection configuration; however, this is seen as wasteful, as it would result in unnecessary manifest requests and another licence acquisition journey (which may be relatively expensive) with the new MediaKeySession that will be necessary. It feels like this whole transition would be better managed within Shaka to be more efficient.

I hope I am wrong, and that you can keep MediaKeys alive while reloading MediaSource. Otherwise, you may not get so much benefit from this proposal. I think the first thing you should do if you want to pursue this is determine whether or not you can keep MediaKeys when tearing down MediaSource.

I suggest you should write a test case for this and send a PR. I can trigger it to run on all the supported platforms in our lab, which should give us more thorough results than just running it on Chrome. If it works on Chrome, but not on, say, Tizen, the implementation could get much more complicated, or we might need to restrict the flag to only certain platforms.

Thoughts?

joeyparrish avatar Aug 01 '22 21:08 joeyparrish

Thanks for the quick response and detailed feedback.

You are right that a lot of the benefit of this feature depends on being able to maintain the MediaKeys while reloading the MediaSource.

Internally we've managed to demonstrate this concept works. For other (resiliency) reasons we have implemented a basic "retry stream" workflow that successfully maintains the keys and sessions. I've outlined what we did to achieve this in the collapsed section below.

media key session preservation

Catching up with what was done on our end, I can see that we added a property to the DrmConfiguration that indicates whether or not Shaka should preserveMediaKeySessions. The rest of the solution is pretty much entirely within shaka.media.DrmEngine in drm_engine.js.

When true this would prompt an early return in the closeOpenSessions_ private method within drm_engine.js: https://github.com/shaka-project/shaka-player/blob/main/lib/media/drm_engine.js#L1781-L1814

Which essentially means that we do not close (or remove) the MediaKeySession object.

We then added methods for extracting (a newly created type) "MediaKeysData":

/**
  * @typedef {{
  *   mediaKeysInstance: ?MediaKeys,
  *   activeSessions: ?Map.<MediaKeySession, Uint8Array>
  * }}
  *
  * @property {?MediaKeys} mediaKeysInstance
  *   A MediaKeys instance which can be provided and will be used instead of
  *   creating a new instance in the DRM Engine.
  * @property {?Map.<MediaKeySession, Uint8Array>} activeSessions
  *   A Map of active Media Key Sessions and their ID
  *
  * @exportDoc
  */
 shaka.extern.MediaKeysData;

As such:

/**
 * Get the MediaKeys and active Media Key Sessions
 *
 * @return {shaka.extern.MediaKeysData}
 */
getMediaKeysData() {
  /** @type {Map<MediaKeySession, Uint8Array>} */
  const adaptedSessions = new Map();
  const activeSessions = Array.from(this.activeSessions_.entries());
  for (const [session, metadata] of activeSessions) {
    adaptedSessions.set(session, metadata.initData);
  }
  return {
    mediaKeysInstance: this.mediaKeys_,
    activeSessions: adaptedSessions,
  };
}

Then later (when we close one Shaka instance and start a new one), we provide a way to inject this into a new DrmEngine instance (setting MediaKeys on this.mediaKeys_ in construction) and adapt the activeSessions back for the new session (which gets set as the this.activeSessions_ if exists in constructor):

/**
 * Adapt external active session data
 * @param {Map<MediaKeySession, Uint8Array>} sessions
 * @return {!Map.<MediaKeySession, shaka.media.DrmEngine.SessionMetaData>}
 * @private
 */
adaptExternalActiveSessions_(sessions) {
  /** @type {!Map<MediaKeySession, shaka.media.DrmEngine.SessionMetaData>} */
  const adaptedSessions = new Map();

  const activeSessions = Array.from((sessions).entries());
  for (const [session, initData] of activeSessions) {
    adaptedSessions.set(session, {
      loaded: true,
      initData,
      session,
      oldExpiration: Infinity,
      updatePromise: null,
    });
  }
  return adaptedSessions;
}

Later on within queryMediaKeys_ we run through until setting the this.currentDrmInfo_, but then early return before creating new mediakeys (since this.mediaKeys_ was already set from constructor).

Everything else essentially remains unchanged and playback is able to continue without an extra external licence challenge being generated.

The code seemed pretty specific for our use-case and so the team didn't contribute back, but the principle of maintaining an active MediaKeysSession between two playback sessions is demonstrated, and so could be adapted to be used as part of this proposal.


This seems to work on the devices we have tested so far, but that being said, there is no guarantee that it works for all devices and so it would be really appreciated to have visibility on a wide range of devices if we could get that via a device lab test.

I suggest you should write a test case for this and send a PR. I can trigger it to run on all the supported platforms in our lab, which should give us more thorough results than just running it on Chrome.

That sounds ideal. Just to clarify, can we submit a PR without the intention of merging it, just so that we could get a run triggered with the device lab? I imagine we could have a demo app example that would trigger a reload of the src (on some command) while maintaining the media keys and then assert that another DRM challenge is not created; does that sound like a reasonable proof of concept?

I think we must be careful to restrict this in a way that reloads are never allowed in ABR decisions, no matter what setting you have enabled. A user should be willing to accept a brief interruption in playback when changing languages, because that's a user-driven action. But that would not be acceptable when the player notices a bandwidth change.

Agreed 👍

If it works on Chrome, but not on, say, Tizen, the implementation could get much more complicated, or we might need to restrict the flag to only certain platforms.

That's an interesting issue. Is your concern that it may lead to playback failure or more that it may lead to an unexpected DRM licence challenge? If the latter is the case, then I personally feel that it can be documented that there is a known risk that on some platforms a new DRM challenge may be initiated on the reload (best efforts to documented known platforms would be good), and then it is up to the consumer to allow or not; I can imagine some apps will be fine with the mid-stream licence challenge with others not, but still having the logic encapsulated within the player will be helpful (especially given the hope is that it works on most platforms 🤞 )... Happy to hear opinions otherwise. However, if the former is the case (playback failures), then that probably isn't acceptable risk... In that case how would you foresee restricting the flag? Would it just be a no-op on platforms that have known issues?

theRealRobG avatar Aug 03 '22 04:08 theRealRobG

That sounds ideal. Just to clarify, can we submit a PR without the intention of merging it, just so that we could get a run triggered with the device lab?

Yes, exactly. Maintainers can explicitly trigger a lab run on any PR. It would be equivalent to running ./build/test.py, but on a Selenium grid full of interesting devices instead of just on your local browsers.

I imagine we could have a demo app example that would trigger a reload of the src (on some command) while maintaining the media keys and then assert that another DRM challenge is not created; does that sound like a reasonable proof of concept?

I'm not sure if that works for the testing workflow we already have. If you can do this in a Jasmine test in test/_unit.js or test/_integration.js, such that it is run by build/test.py, then we can easily test it for you in the lab.

That's an interesting issue. Is your concern that it may lead to playback failure or more that it may lead to an unexpected DRM licence challenge?

More that it would result in a failure. We've seen weird things with EME integrations on smart TVs and other devices. For example, we have some weird workarounds to rebuild init segments for certain platforms like Tizen and Xbox. They require an init segment to signal encryption, whereas browsers only require MediaKeys to be set up in advance. Xbox would ignore MediaKeys if the first init segment didn't signal encryption, and if a later init segment signaled encryption... too late, won't decrypt, decode failure on encrypted data.

If it's opt-in, though, as you mentioned with your change to DrmConfiguration, that could still work. The opt-in value could be default true in places where we know it works, and default false otherwise. Nobody has to touch it to get good results, and anybody can explicitly set it.

but still having the logic encapsulated within the player will be helpful (especially given the hope is that it works on most platforms 🤞 )

Agreed!

joeyparrish avatar Aug 05 '22 22:08 joeyparrish

We've seen weird things with EME integrations [...]

Makes sense and totally agree; I also saw the strange Xbox behaviour with regards to DRM signalling (fun times delivering pre-roll ads 😄 ).

The opt-in value could be default true in places where we know it works, and default false otherwise. Nobody has to touch it to get good results, and anybody can explicitly set it.

That sounds ideal 👍

I'm not sure if that works for the testing workflow we already have. If you can do this in a Jasmine test in test/_unit.js or test/_integration.js, such that it is run by build/test.py, then we can easily test it for you in the lab.

Makes sense! I'll look to get something written up shortly.

theRealRobG avatar Aug 08 '22 14:08 theRealRobG

@theRealRobG , do you have any updates to share?

avelad avatar Aug 26 '22 05:08 avelad

@avelad Sorry for the silence. Just after my message a lot of high priority issues came up at work and I haven't had any capacity to implement the integration tests (neither has anyone else in our team).

I might have a moment today actually so I'll see what I can do and let you know.

theRealRobG avatar Aug 26 '22 18:08 theRealRobG

Hi @joeyparrish @avelad ,

Sorry for the delay. I've introduced an integration test within test/media/drm_engine_integration.js here: https://github.com/shaka-project/shaka-player/pull/4456

I notice that it is failing the PR naming convention... I can change that to prefix with test: if you like? I just prefixed it with [DO NOT MERGE] to hammer home that I don't intend the PR to be code that gets merged back in.

I've ran the test on Chrome and I believe it is demonstrating what I claim, but please could you take a quick look at the steps of the test (drmIt('re-uses drm session for subsequent playback'), just to be sure that I am properly destroying and re-creating the objects that we would expect to in order to prove usage with a new media source? I also verified that it works if I completely remove the video from the document, create a new one, and append it back, so I am pretty sure it works either way... But I decided to keep it to just destroying the various engines as I imagine that is closer to what we will actually want in a real solution.

Please also note that I haven't considered how this works exactly with the feature suggestion above (i.e. where and when destroy gets called and how things get recreated)... But I just wanted to focus on proving that the re-use of media key session was possible with MSE+EME API across devices.

Hopefully this test is successful; assuming that it is we likely have time to implement the audio suggestions sometime later in September.

theRealRobG avatar Aug 30 '22 19:08 theRealRobG

No worries on the PR naming convention. It prevents merge, but this wasn't meant to be merged as-is. And if you want to clean it up for merge later, you can just rename the PR at that time.

joeyparrish avatar Aug 30 '22 20:08 joeyparrish