webrtc-stats
webrtc-stats copied to clipboard
Measuring background noise (energy)
As more and more offices are adopting open work places, it is important to measure the background noise in open office environments and measure the effect on real-time communication. Is it possible for browsers to measure this metric?
@henbos Is the browser (audio stack) able to tell the difference between speaking and non-speaking audio energy?
@vr000m Lots of ML work on separating "noise" from speakers. For example, see: https://devblogs.nvidia.com/nvidia-real-time-noise-suppression-deep-learning/
Related: Issue https://github.com/w3c/webrtc-stats/issues/383
@vr00m @henbos Is this worth discussing at TPAC?
Implementations may be doing several things to improve quality, like echo cancellation, synthesizing samples to conceal packet loss, and noise suppression. Strategies are, however, implementation-specific. This can make standardizing metrics around it difficult.
We could add a metric to say that this is the implementation's estimate of the current background noise levels. But different implementations may do different things to attempt to calculate this, which could potentially yield different numbers in different scenarios.
This reminds me the attempt to standardize likelihood of echo which got ice boxed.
But hey, an experimental metric might be better than no metric, even if we can't yet guarantee interoperable estimates?
Action Item: Talk to an audio engineer to see if this is something that we could measure :)
@ivocreusen Do we have any estimates of background noise that could be exposed as stats? E.g. totalBackgroundAudioEnergy that would be <= totalAudioEnergy?
This is another issue I would like to talk to you about :)
I think that the hard part for a standardized metric for background noise is defining what part of the signal is background and what part is not. We do have a background noise estimate in the gain controller, but if we were to standardize how it's computed we won't be able to easily make changes/improvements (without adding unnecessary additional computations).
Would it be an option to have a background noise metric without specifying how to decide what part of the signal is background?
On Tue, Sep 10, 2019 at 11:19 AM henbos [email protected] wrote:
@ivocreusen https://github.com/ivocreusen Do we have any estimates of background noise that could be exposed as stats? E.g. totalBackgroundAudioEnergy that would be <= totalAudioEnergy https://w3c.github.io/webrtc-stats/#dom-rtcinboundrtpstreamstats-totalaudioenergy ?
This is another issue I would like to talk to you about :)
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/w3c/webrtc-stats/issues/465?email_source=notifications&email_token=AEIYFIQBLWK6UVDG5KHUCNTQI5RA5A5CNFSM4IGYBQH2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6KN3IQ#issuecomment-529849762, or mute the thread https://github.com/notifications/unsubscribe-auth/AEIYFITFETB6RT3ILCH6LQTQI5RA5ANCNFSM4IGYBQHQ .
I would propose the metric to be well defined in what it is a measurement of (audio energy), without specifying how the measurement is obtained. And then declare inaccurate estimates implementations bugs rather than spec bugs. Fingers crossed?
This may work... assuming there are three classes of background noise detectors:
- very aggressive -- these would have low tolerance and end up marking low noisy environments (and any thing above) as noisy.
- very pessimistic -- these would have high tolerance and would mark very high noisy environments, if they are really bad estimators, they may never mark because noise and speech would be indistinuishable.
- something in between... I think this is what most implementations will end up doing, and we probably live with these kind of estimators elsewhere in the stack (for example -- quality/cpu limitation, etc).
Talked to an audio engineer, his take roughly:
- Not sure about the use case of this in a web conferencing app. Questioning how interesting it is to implement.
- Not sure how standardisable it is.
There may be something we could surface, but he says it might be better for people to analze their audio themselves with WebAudio.
I'm not sure what to do with this one so I'm removing the TPAC label
@vr000m If you still want this (or #383) to be discussed at TPAC feel free to prepare a slide for it.