element-call icon indicating copy to clipboard operation
element-call copied to clipboard

Feat: Voice activity threshold slider

Open hugohutri opened this issue 3 years ago • 18 comments

Adding voice activity threshold slider, since this is pretty important feature for voice calls, and similar feature can be found in Discord etc. If volume is below the threshold, the track will be muted.

Added features:

  • [x] Reusable slider component
  • [x] Voice activity threshold slider in the settings modal
  • [x] Volume indicator in the slider track
  • [x] Working voice activity detection

https://user-images.githubusercontent.com/55588133/185294116-266c3aa2-b041-461e-bbde-5b14ae203779.mp4

Requires: https://github.com/matrix-org/matrix-js-sdk/pull/2556

hugohutri avatar Aug 01 '22 16:08 hugohutri

Some initial bits of feedback on this - mainly that it looks like this is done using the volume analysis: I would think that this is always going to miss the very start of the phrase since it will already have been discarded by the time it's triggered a volume event, so this approach might not work. I'm not an expert on how VAD is done 'properly' though (ie. without introducing latency to do the analysis).

(Also, it's "threshold" with an 'h'.)

dbkr avatar Aug 19 '22 13:08 dbkr

Me and @hugohutri did both try it on our machines and it seemed to work fine, we can do some more testing ofc. But other than that, do you have any other ideas on how we can accomplish it? In theory it shouldn't be that much of a hassle, but we need to get the samples and this seemed to be the best way.

DashieTM avatar Aug 23 '22 19:08 DashieTM

Typo is now fixed. I haven't noticed any problems with the start of a phrase, and it seems to capture it.

hugohutri avatar Aug 24 '22 16:08 hugohutri

Signed-off-by: Hugo Hutri [email protected]

hugohutri avatar Aug 25 '22 19:08 hugohutri

Signed-off-by: Fabio Lenherr / DashieTM [email protected]

DashieTM avatar Aug 25 '22 19:08 DashieTM

A few questions/comments from a product POV:

  • What is the default position of the threshold? My expectation is that it would be to the far left, therefore meaning the feature is effectively disabled by default and users must make a positive choice to enable it by selecting a position somewhere on the slider. Is that correct?

  • To understand the slider, it relies on me speaking while I slide it. I feel we need additional UI or explanatory text to make this clearer. It would not be obvious what this feature is doing if I land on this screen while there is no audio to be picked up and trigger the red/green bar.

  • Related to the above, this appears in a setting screen that can be accessed and used while I am in a call. This raises a couple of concerns.

    • If I am actively speaking on the call while testing the correct position, then presumably my audio will be cut in and out as I move the slider around. That is hardly ideal. We might get around that by having it not take effect until I 'let go' of the slider - but is that detail going to be obvious to the user? If not, it creates the risk that they might think their audio is being blocked when it fact it is not. A clearer, more positive option might be a 'save' button to put the configuration into effect.
    • More likely, people will want to configure the slider position while not actually engaged in conversation, e.g. by saying 'testing, testing' or other nonsense. Is it sufficiently clear to users that while this screen is visible their audio is still active and they may be heard by other people on the call unless they have muted themselves first? I suspect it isn't. One option might be to add some UI to make it clear they aren't muted, and give them the option to mute, from within this settings panel.

There is a risk that to resolve all of these points, the UI could become over-complex. It would be useful to have a designer take a look at this as well. @gaelledel, not sure if you have any thoughts on this.

jakewb-b avatar Aug 31 '22 14:08 jakewb-b

https://github.com/vector-im/element-call/issues/562

fkwp avatar Aug 31 '22 14:08 fkwp

Right now the Threshold is set to -55, but we can still lower it if needed. Lowering it to -100 does indeed remove the feature altogether as it isn't needed then.

About the UI concerns, we can still write some Text beneath the feature if that is necessary, or include some other UI effects to make it more clear. We tried to do similar UI styles as other programs with this feature have done.

For the last concern about setting the threshold, I think it would indeed be the best if we simply mute the microphone during the "testing" phase. I assume a button would be good here in order to not mute when we open the settings page, but only when we actually want to move the slider around.

DashieTM avatar Aug 31 '22 17:08 DashieTM

here is a screenshot from discord to compare UI: image

DashieTM avatar Aug 31 '22 22:08 DashieTM

Right now the Threshold is set to -55, but we can still lower it if needed. Lowering it to -100 does indeed remove the feature altogether as it isn't needed then.

Thanks. I'd want the default to be -100, at least to begin with, so that when we first release there is no change to the sensitivity for most users, but those who want to make a change can do so.

For the last concern about setting the threshold, I think it would indeed be the best if we simply mute the microphone during the "testing" phase. I assume a button would be good here in order to not mute when we open the settings page, but only when we actually want to move the slider around.

I agree. My concern is that there are several things that we need to make clear to the user, without over-complicating the UI.

  • You can slide the slider to adjust the threshold
  • You need to speak to see your current sound level
  • You can mute yourself, so you can test audio without being heard on your call

Exactly how we mute users needs consideration.

Only muting when they interact with the slider is insufficient, I think. We could just add a mute button, and let them mute/unmute themselves. A third option is to have a separate screen to open the slider, and while they are in that screen they are always muted.

I'd lean towards the mute button as the clearest and simplest, and this can double-up as a way to clearly show them if they are un-muted and so sending their audio to the call.

However, I'm not a UX designer, so I'm open to other ways of addressing my concerns, but I do think they need addressing somehow.

The feature is nice, though, and nicely implemented. If we can clear up these little details it will be a good addition, I think.

jakewb-b avatar Sep 01 '22 08:09 jakewb-b

Alright that sounds good, I will make sure the UI etc is created. After that I can ofc make changes where necessary.

Kooha-2022-09-01-18-05-09 the preview is a bit bugged for some reason (slowmo), but this is how it currently looks with the button implemented.

  • The button mutes and enables the threshold change.
  • default threshold is now at -100. (My suggestion is to later change this to about -60)

DashieTM avatar Sep 01 '22 16:09 DashieTM

In Mumble, there are more settings for triggering the Voice Activation. They have even two Sliders for some kind of preactivation. But anyway, maybe it already works good enough with some hardcoded values

They also have an Dropdown Setting, for when to Transmit. "Continously" -> "Voice Activity" -> "Push To talk" Maybe it makes sense to provide an similiar Setting to not confuse the Users? Just an Toggle Switch with "Enable Voice Activation"

BTW, does that also work in Walkie-Talkie/PTT Mode? It should just "virtually" press the PTT Button when over the threshold. But there should be an Additional Toggle in the Walkie-Talkie UI to enable this. On real Walkie-Talkies/Radios this is usually called "VOX"

image

fti7 avatar Sep 06 '22 05:09 fti7

BTW, does that also work in Walkie-Talkie/PTT Mode? It should just "virtually" press the PTT Button when over the threshold.

I wouldn't expect this to work like this. On a walkie-talkie call, all users are muted and pressing the PTT button un-mutes you. The voice activation threshold isn't un-muting users, but determining whether an un-muted user's audio is transmitted or not. There may be scope for voice-activated PTT in future but I'm not confident in the user requirements here yet, or how it should be implemented, so I don't want to set the expectation that we would necessarily accept a PR that implemented this.

jakewb-b avatar Sep 06 '22 08:09 jakewb-b

I moved the option to separate area -> Advanced. I also added more detailed information about the feature to the hint text. Also transparency issues and Firefox issues are now fixed (slider color, slider progress color)

image

hugohutri avatar Sep 21 '22 18:09 hugohutri

Also, please update the branch

SimonBrandner avatar Oct 04 '22 07:10 SimonBrandner

@hugohutri @DashieTM and everyone else who is contributing to this, Thank you! I'm Gaelle, member of the product design team here at Element. I have reviewed the above and my conclusions are the following:

  1. What strikes me the most from a UX perspective is that you are proposing a settings that should apply to the general behaviour of a selected mic inside a call room settings. As this is a general settings, we should place the user in the right context in the system where general adjustments are done at a universal level rather than local level. So in order for us to integrate this work we'd need to create a general settings access point. The access point should be accessible via the Lobby screen. On the top right hand corner you will see your avatar > OnClick You currently have: ProfileName > Sign Out. We could therefore have instead: ProfileName > (cogwheel icon) Settings > Sign out. On Click settings > Show the same settings modal as room settings but with the addition of the slider in the Audio settings tab just like you have done above.
Screenshot 2022-10-05 at 14 38 54
  1. The first implementation with a slider and a defined threshold point seems most appropriate - I'm no acoustic expert but this guy here https://acousticnature.com/journal/which-microphone-sensitivity-is-better-dbv-vs-mv suggests the default should be at around -50db -60db

  2. I am a fan with the way Discord implemented their functionality - it is simple and effective visually and interaction wise

  3. I agree with adding the extra string of text for context under the slider to explain what the functionality does.

  4. A way to automatically determine mic sensitivity would be super useful. Similar to Discord functionality with the toggle switch - no brainer for the user the system does it!

Let me know if you'd like to sync in person as well - we can organise a call np

gaelledel avatar Oct 05 '22 12:10 gaelledel

(re-request my review when ready)

SimonBrandner avatar Oct 06 '22 12:10 SimonBrandner

Alright thanks @gaelledel for the reply! I will reply to each point individually.

  1. When we first started the pr I had the same thought about it not being a global change since it only affects each room individually, however we also knew that at some point element call will be integrated into element itself, which is why we left it as is. If these settings on the top right will not be available in element web / element desktop / etc, then we would need to implement them in the element settings itself as well. This would also be the proper implementation in my opinion, as all element settings can then be changed in one place. We have already addressed some points from simon for now and we will update the branches on the weekend, then we can also move this to a new settings page on the top right.

  2. I absolutely agree, we still have the slider, the threshold can be changed back to 60.

  3. That's exactly where the idea comes from :D

  4. alright, We can always change it if needed later on.

  5. This would be nice, but the first time I looked at it, it was quite a task with all the math involved. We will see if we can implement it in this pr, but I will give it another look.

About syncing, I am open about whatever you would like to use. For matrix the id is fabio.lenherr:matrix.org Time zone is gmt + 1, writing is fine during the day, a call would probably be best in the evening or on thursdays.

DashieTM avatar Oct 07 '22 07:10 DashieTM

Related: #714 (Noise reduction)

iakat avatar Jan 20 '23 14:01 iakat

Hi @hugohutri and @DashieTM, really sorry to have let this PR languish. Unfortunately, the reality is that we don't have the product or design bandwidth to support a feature like this at the moment, given that it doesn't fit in with our roadmap items for the foreseeable future. To reflect this, I'm going to close the PR.

Please don't hesitate to reach out to us again in the public WebRTC room if you want to discuss other ways to contribute to Element Call. I think there's still a lot of opportunities where we'd be happy to let the community get involved, but they're generally more aligned to a video call use case, rather than voice-first.

robintown avatar Jul 19 '23 17:07 robintown

@robintown I honestly don't understand this decision. Even if it doesn't fit with the roadmap, this PR is basically done and has been for quite some time. Why not just merge it so the users wanting/needing it are happy, and move on with your roadmap?

SplittyDev avatar Jul 19 '23 19:07 SplittyDev

I'm quite disappointed with the progress of this PR, especially considering it's the only one I've been following in this repository. I've been regularly checking for availability on this feature, but no update came.

People contributed to this feature because they consider it important, and I share the same sentiment. Widely used applications like Discord have a similar feature to address background noise quickly. Video calls are in fact voice-first because you communicate via the voice, having a volume threshold is crucial for a smooth experience. The absence of this feature is the main reason I haven't been using element-call, and I know others who face the same issue. Additionally, I think this should be a straightforward feature accessible to all users, not just confined to an "advanced section".

It's unfortunate to see that something seemingly simple, even with the support of the open-source community, can't be implemented due to not enough "bandwidth".

Neotamandua avatar Jul 19 '23 19:07 Neotamandua

Hi @Neotamandua @SplittyDev @fti7,

first of all, I would also like to apologise for being so hesitant with this longstanding PR, which also showcases that we have not taken it easy not to merge it. In fact, we had a passionate discussion about it several times. Element Call was designed to be a slim lightweight VoIP app which just works also for non-expert users. Since this proposed voice gate would require a decent amount of technical knowledge for a non-discord non-power users we decided against it.

Now coming to the actual functionality of this PR "Voice activity threshold / gate" and, if applicable, possilbe workarounds. At least on linux using the pipewire audio daemon the following projects provide you similar functionality:

  • easyeffects which is a bunch of audio filters (including a gate) controlled by a nice GUI
  • A virtual mic source using DeepFilterNet's ladspa plugin which provides a ML noise reduction on the level of krispAI

PS: Talking about 'we': I want make you aware that this was a team decision and @robintown is only the messenger.

fkwp avatar Jul 20 '23 08:07 fkwp

I hope this is not considered spam, and I am sorry for the late response, but I was a bit busy this week.

While I am of course, sad to hear that this feature will not be available in element call, I also can't say that this PR was maintained for long, we also stopped as we simply waited for a response. ( see the backend PR what I mean with no longer maintained ) I guess the only thing I would hope is a bit more transparency for future PRs. It is quite frustrating to see a PR just sit still, but I of course understand that this was by far not the only PR.

At the end I want to thank all the nice devs, especially Simon for the helpful reviews! Also quite fast!

Some additional things: I still believe that discord users are an untapped source for you guys here, I know they will not bring monetary value, but they could bring a larger userbase. All they want essentially is a one click room join without any accept button and some voice features, if not VAD, then perhaps regular noise suppression etc. Inspiration can always be taken from other open source projects such as mumble and revolt.

I wish you the best with this project, and I hope to see this in other matrix clients as well at some point, cheers.

DashieTM avatar Jul 23 '23 17:07 DashieTM

Hello @fkwp,

I have to disagree with you and everyone involved in making this decision.

Considering that Element Call was intended to be a slim and lightweight VoIP app, it is interesting to see support for screenshare, which undoubtedly is a valuable feature, but considerably more sophisticated than a simple volume threshold? Especially because it is designed for simplicity, a volume threshold serves as the simplest inbuilt solution, eliminating the need for complex ML noise reduction or other algorithms. This feature, being optional, serves to both technical and non-technical users. Non-power users who may not feel comfortable using it can simply choose not to, but depriving everyone of this option seems unreasonable.

Considering that this app is designed for VoIP, adding a volume threshold is in line with providing an all-encompassing experience. I have personally encountered instances where meetings suffered from poor voice quality, and a volume threshold would have been immensely helpful in rectifying the issue. Even for non-technical users, colleagues in the meeting could easily guide them (even with screenshare) to set the volume threshold correctly, enhancing the overall voice experience for everyone involved, making the whole application more appealing.

Coming back to discord again: It is a prime example of a mass adopted application with a user base also consisting of non-power users, which successfully implemented a volume gate. This feature has been a market standard, also adopted by successful predecessors like e.g., Teamspeak, making it a widely recognized and easily explainable tool for managing voice quality. I don't see how this needs a decent amount of technical knowledge in order to use it properly, especially given the previously mentioned fact, that other meeting attendees can help out.

I kindly recommend you to reconsider this decision.

Neotamandua avatar Jul 23 '23 21:07 Neotamandua

To summarize, I understand that the main objections against this feature are:

  1. Limited design bandwidth to support this features / features outside the roadmap (brought forward by @robintown)
  2. Element Call needs to just work for non-expert, non-discord, non-power users (@fkwp)
  3. Element Call should be kept lightweight (@fkwp)

Regarding point 1 @SplittyDev has raised an excellent point: Seeing how this PR is basically done, the bandwidth requirement should be minimal.

Regarding point 2 and 3, I fully agree with @Neotamandua and I would like to add a few more thoughts:

About point 2: I believe this argument is misplaced in this discussion, because the feature does not take away from the existing functionality or hamper it in any way. Non-experts can and likely will just ignore such a setting.

About point 3: While I understand that you want to keep Element Call as lightweight as possible, I do believe that the benefits of this feature by far outweighs the added complexity, because it solves a big fraction of a more complex problem (see also #714) in a very simple way. In none of the major voice or video conferencing softwares that I am using (this includes not only Teamspeak and Discord, but also more business-focused applications like Slack, Teams and Zoom), I hear as much typing, mouse clicking and other background noise as in Element Call. I expect silence when nobody is talking, especially since this is a solved problem virtually everywhere else (and has been solved for more than 15 years, e.g. with TeamSpeak 2).

With that, I would like to join @Neotamandua in politely asking you to reconsider this decision.

MLWeber avatar Sep 02 '23 13:09 MLWeber

I've been been following this PR since the beginning and I was pretty shocked to see it rejected. It's frustrating that Element is constantly being touted as an alternative to Discord and, while there is some truth in that, the things that make Discord Discord are constantly being ignored.

It's getting to the point that whenever I see a social media post from Element or one of the Matrix people comparing Element to Discord, my brain subconsciously edits the message to "better than generic messenger" because Element is leagues behind Discord despite only needing to implement a small number of features to get their foot in.

eonrider avatar Sep 21 '23 19:09 eonrider

I agree completely with the above couple of comments and I cannot wrap my head around how implementing these basic features is not one of their top priorities. I am absolutely certain that the moment element had discord / teamspeak / mumble style voice-only rooms with voice activation they would instantly gain tens if not hundreds of thousands of users. Just search for "self-hosted discord alternative" and you will see that there is a huge demand.

Instead, the focus seems to lie on becoming another competitor to slack or teams, which in my opinion is a lost cause, especially if they don't already have a large userbase that would advocate for element. The only explanation I can come up with is that it is easier to secure investments with a business focused target userbase. I think this can never succeed.

Sorry for the off-topic rant. It is so frustrating to see a project with so much potential go absolutely nowhere due to narrow-minded strategic decisions.

hoptional avatar Sep 21 '23 20:09 hoptional