ExoPlayer Support low latency custom audio signal processing

Currently ExoPlayer plays using AudioTrack configured with a significant (>250ms) buffer. This is done to avoid underrun and works well for most of our users. Nevertheless, it prevents some use-cases that require low latency signal processing.

Goal

Such high latency is an issue when modifying the media (usually the audio) depending on low latency information (eg spacialized audio must react to user head movement with as low latency as possible). This issue only consider the audio pipeline as (I believe) low latency video DSP can already be implemented using shaders. This was requested in #8962, #8722, #8665 (and possibly other).

This issue is NOT about real time media playback (eg a video call). End to end low latency playback is not a goal of ExoPlayer. Making all of ExoPlayer pipelines low latency would need a deep redesign and negatively affect power efficiency.

Solutions

To allow apps to inject custom Audio Digital Signal Processing (ADSP) in ExoPlayer and have minimal latency on the playback, the audio produced by the ADSP needs to be played to the user as fast as possible.

ExoPlayer post ADSP has very little internal buffer, so most of the latency occurs after the write to the AudioTrack.

There are 2 main way to reduce the latency of ExoPlayer post ADSP playback:

reduce the AudioTrack buffer size and switch to performance mode rather than power efficiency mode.
use Oboe to play instead of AudioTrack.

AudioTrack tuning

Having a low latency AudioTrack configuration (solution 1) is far more simple to implement (see https://github.com/google/ExoPlayer/issues/8665#issuecomment-789595598) and should greatly improve the situation.

Nevertheless it will not allow latency to be as low as Oboe. As very rough estimate, I would guess an optimised AudioTrack (with fast mixer) could reach a 20ms. This is of course extremely dependent on the device (@dturner might have better estimates).

Oboe

An Oboe extension on the other hand could reach the best possible latency with as low as >10ms on device supporting Aaudio MMAP NOIRQ. Nevertheless, this would require significant work to interfacing between Oboe and ExoPlayer, tuning and maintenance, so it should only be done if the low enough latency can't be achieved with AudioTrack tuning.

Adding latency before ADSP

Whatever solution is used, ExoPlayer current AudioTrack buffer was chosen because such big buffer allows smoothing the decoding speed jitter (decoders sometime decode in bursts). By having a low latency path between from the decoder output to the audio output, ExoPlayer runs very much at risks of underruning every time the decoder takes longer to decode than a packet time. This could manifests itself for example by having audio glitch whenever a cpu burst occurs (eg user scrolls a page).

To avoid that a buffer should be added between the audio decoder and the ADSP to leave the decoder->output latency unaffected by the low latency AudioSink:

before: Decoder -> ADSP --[big buffer (AudioTrack)]--> Speaker
after : Decoder --[big buffer (ExoPlayer)]--> ADSP -[low latency AudioTrack]-> Speaker

Jitter smoothing buffer Implementation

This new buffer could take the form of an AudioProcessor to easily remove it from the processing chain, or it could be part of DefaultAudioSink.

Additional information

Overview of ExoPlayer's audio pipeline: https://github.com/google/ExoPlayer/issues/8722#issuecomment-801925912 Previous discussion on minimising ADSP latency: https://github.com/google/ExoPlayer/issues/8665#issuecomment-789595598

May 20 '21 19:05 krocard

@krocard @dturner As your suggest, I have implement an AudioProcessor to sink the audio data to Oboe. I found that the logic of getting the current audio playback position from AudioTrack is too complex， So I do like a tee audio processor to sink the audio data to the oboe player, and at the same time, sink the silence PCM data to AudioTrack, so that the a/v sync module could do as before. And I assume the oboe player's latency is enough low to ensure the a/v is still in sync.

But now the implementation have two problems, I need your help, thanks!

Now the audio MediaCodec decoder output PCM format is int16, but the Oboe's AudioStreamCallback:: onAudioReady need the PCM format is float. I try to implement this in many way code , but the output audio still have some noise, I guess the covert method is wrong. I'm not be familiar with the layout of the different PCM format. And what's the meaning of the convert?
@dturner I have post all the event message to the same handler thread to call the Oboe player's api, code, but the thread which onAudioReady called are always different. Is this correct? And When I release the ExoPlayer and I want to stop the oboe audio player, but this can't success! Hope give some help!

The implementation's repo is at here, the samples/cts-oboe. If you have time, please do the code review and give some advices . thanks very much!

Dec 09 '21 13:12 Romantic-LiXuefeng

Now the audio MediaCodec decoder output PCM format is int16, but the Oboe's AudioStreamCallback:: onAudioReady need the PCM format is float. I try to implement this in many way code , but the output audio still have some noise, I guess the covert method is wrong. I'm not be familiar with the layout of the different PCM format. And what's the meaning of the convert?

The easiest way to solve this issue would be to enable float playback in DefaultAudioSink using enableFloatOutput. Nevertheless, currently custom processing is only used on the int path: https://github.com/google/ExoPlayer/blob/029a2b27cbdc27cf9d51d4a73ebeb503968849f6/library/core/src/main/java/com/google/android/exoplayer2/audio/DefaultAudioSink.java#L449 I don't know why we have such restriction. It seems easy to enable it also for float if needed.

But even if float processing was enabled, you would still have the issue that float currently is only used for high quality input formats (>24bit). The easiest way to solve your issue is to surround your Oboeprocessor by a FloatResamplingAudioProcessor before and ResamplingAudioProcessor (which resamples to int16 as its name doesn't indicate:) after. Such as: new DefaultAudioProcessingChain(new FloatResamplingAudioProcessor(), new OboeProcessor(), new ResamplingAudioProcessor()).

Dec 09 '21 20:12 krocard

found that the logic of getting the current audio playback position from AudioTrack is too complex， So I do like a tee audio processor to sink the audio data to the oboe player

Would you mind giving us more detail about this? I'm not sure I understand the issue you faced. Thanks.

and at the same time, sink the silence PCM data to AudioTrack, so that the a/v sync module could do as before. And I assume the oboe player's latency is enough low to ensure the a/v is still in sync.

The video will sync with the AudioTrack, which has a much bigger latency than Oboe. I'm afraid the audio will be in advance compared to video.

Fully replace DefaultAudioSink by an Oboe AudioSink would avoid this issue, but your smart worakround of using an AudioProcessor diversion might be good enough for your use case.

Dec 09 '21 20:12 krocard

But even if float processing was enabled, you would still have the issue that float currently is only used for high quality input formats (>24bit). The easiest way to solve your issue is to surround your Oboeprocessor by a FloatResamplingAudioProcessor before and ResamplingAudioProcessor (which resamples to int16 as its name doesn't indicate:) after.

As your said, The FloatResamplingAudioProcessor is only used for high quality input formats (>24bit). Now for my contents, the decoder output format always ENCODING_PCM_16BIT, So this's not convenience for this case(convert from int16 to float).

Such as: new DefaultAudioProcessingChain(new FloatResamplingAudioProcessor(), new OboeProcessor(), new ResamplingAudioProcessor()).

In the design the pipeline is :

audio decoder output --> Processor1 --> Processor2 -> ..... ->  OboeProcessor -->  Oboe Audio Player / silence PCM to AudioTrack

So the OboeProcessor is the last node of the pipeline.

Dec 10 '21 01:12 Romantic-LiXuefeng

Now the audio MediaCodec decoder output PCM format is int16, but the Oboe's AudioStreamCallback:: onAudioReady need the PCM format is float. I try to implement this in many way code , but the output audio still have some noise, I guess the covert method is wrong. I'm not be familiar with the layout of the different PCM format. And what's the meaning of the convert?

I looked at you code, and the conversion seems correct. What kind of noise are you hearing?

I would implement it as such:

       for (int i = position; i < limit; i += 3) {
          short value =
                    ((inputBuffer.get(i) & 0xFF))
                  | ((inputBuffer.get(i + 1) & 0xFF) << 8);
          audioDataF[i] = ((float)value) / Short.MAX_VALUE;
      }
      inputBuffer.position(inputBuffer.limit());

But it should be equivalent to your solution.

Jan 13 '22 14:01 krocard

Any further progress about supporting oboe? Thanks

Jul 12 '23 09:07 zjw1918

Up ! I'm curious to know how far this work went

Jan 02 '24 01:01 johannphilippe

ExoPlayer ExoPlayer copied to clipboard

Support low latency custom audio signal processing

Goal

Solutions

AudioTrack tuning

Oboe

Adding latency before ADSP

Jitter smoothing buffer Implementation

Additional information

ExoPlayer
ExoPlayer copied to clipboard