element-call Voice activity detection using a threshold slider

Your use case

https://github.com/vector-im/element-call/pull/492

Have you considered any alternatives?

No response

Additional context

No response

Aug 31 '22 14:08 fkwp

@hugohutri can you pls update the issue and give some more context

Aug 31 '22 14:08 fkwp

Hugo is enjoying some vacation right now, but since we did this together I can of course fill in.

The idea is to have the same feature as mumble, discord and teamspeak provide. They only send the microphone stream to other users if the mic is above a threshold. This threshold is usually editable in some settings page with a slider indicating whether or not you are currently above this threshold.

What does this solve? It's an extremely fast solution to background noise as you only send data when you talk. No more keyboard clacking, birds, or other noises unless you talk. Aka it's a poor mans noise suppression that is cheap to include and is effective for everything. In apps such as teams and currently jitsi, it's extremely annoying to hear people talking that aren't in the call.

You can right now do this on your own system with things like EasyEffects, but this would guarantee that every user has this.

Some info on the current PR. We tried to use the existing Volume Looper as it seemed to be intuitive. The only problem is, the activation point is always just a tad too slow, we are talking a few ms here. The solution I have thought of is using the createDelay() function, but I have not been able to find a place to hook that node up to what we send to other users.

In short, the idea is, create a small delay (ex. 5ms) for every stream we send to other users in order to analyze and manipulate this stream. This delay could then also be used in the future to do more advanced processing of audio. In the end the short delay should not matter for other users, as they will have a delay either way -> ping. What matters is that all streams, voice and video arrive at the same time.

The actual implementation is done on a clone of said audiostream. Important is, this stream should not have the delay. Then when we check if the user is loud enough (on the cloned stream), we can enable the tracks for the real stream. This ensures we are enabling the microphone before the start of a sentence.

links to pr: https://github.com/vector-im/element-call/pull/492 https://github.com/matrix-org/matrix-js-sdk/pull/2556

Aug 31 '22 16:08 DashieTM