svxlink icon indicating copy to clipboard operation
svxlink copied to clipboard

Kerchunking Protection / Voice Activity Detection

Open s1lviu opened this issue 1 year ago • 6 comments

Hello,

I'm interested in enhancing the SVXLink project by integrating voice activity detection (VAD) to address the issue of kerchunking. After some research, I found the libfvad library, which seems to be a promising solution for implementing VAD functionality.

However, my knowledge of C++ is limited, and I'm seeking guidance on the best way to integrate libfvad within the SVXLink project's structure. My goal is to implement a kerchunking protection mechanism that could efficiently detect when the PTT is activated without any voice activity and prevent unnecessary transmissions.

Could someone provide advice on:

  1. How to effectively integrate libfvad with SVXLink's existing audio processing pipeline?
  2. Any potential challenges I should be aware of when working with libfvad and SVXLink together?
  3. Recommendations for testing the implementation to ensure it works as intended without introducing significant latency or affecting the system's performance.

Any insights, advice, or examples of similar implementations would be greatly appreciated. I'm eager to contribute to the SVXLink project and help improve its functionality.

Thank you!

s1lviu avatar Feb 01 '24 18:02 s1lviu

Hi Silviu,

You will find that this is already implemented to some degree within the [RepeaterLogic] and its logic form RepeaterLogic.tcl. I had this operating on a Repeater in France some time ago, but quite frankly was annoying, having the repeater announce that there was an unannounced tx to the repeater every few minutes, such was the frequency of the abuse.

Look at man svxlink.conf at these two sections.

SQL_FLAP_SUP_MIN_TIME Flapping squelch suppression is used to close the repeater down if there is interference on the frequency that open the squelch by short bursts. This configuration variable is used to specify the minimum time, in milliseconds, that a transmission must last to be classified as a real transmission. A good value is in between 500-2000ms. SQL_FLAP_SUP_MAX_COUNT Flapping squelch suppression is used to close the repeater down if there is interference on the frequency that open the squelch by short bursts. This configuration variable is used to specify the maximum number of consecutive short squelch openings allowed before shutting the repeater down. A good value is in between 5-10.

73 Chris G4NAB

On 1 Feb 2024, at 18:59, Silviu Stroe @.***> wrote:

Hello,

I'm interested in enhancing the SVXLink project by integrating voice activity detection (VAD) to address the issue of kerchunking. After some research, I found the libfvad library https://github.com/dpirch/libfvad, which seems to be a promising solution for implementing VAD functionality.

However, my knowledge of C++ is limited, and I'm seeking guidance on the best way to integrate libfvad within the SVXLink project's structure. My goal is to implement a kerchunking protection mechanism that could efficiently detect when the PTT is activated without any voice activity and prevent unnecessary transmissions.

Could someone provide advice on:

How to effectively integrate libfvad with SVXLink's existing audio processing pipeline? 2.Any potential challenges I should be aware of when working with libfvad and SVXLink together? Recommendations for testing the implementation to ensure it works as intended without introducing significant latency or affecting the system's performance. Any insights, advice, or examples of similar implementations would be greatly appreciated. I'm eager to contribute to the SVXLink project and help improve its functionality.

Thank you!

— Reply to this email directly, view it on GitHub https://github.com/sm0svx/svxlink/issues/661, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACAKA5DGMWGMSDABYIDJXPDYRPQX5AVCNFSM6AAAAABCVNV4HSVHI2DSMVQWIX3LMV43ASLTON2WKOZSGEYTGMRUG44DAMQ. You are receiving this because you are subscribed to this thread.

f5vmr avatar Feb 01 '24 20:02 f5vmr

Hi Chris,

Thank you for your detailed response and for outlining the measures already in place with [RepeaterLogic] to mitigate unintended transmissions and interference. Your experience with the system in France highlights a critical balance we need to strike between reducing abuse and ensuring the repeater remains user-friendly and accessible.

Your explanation of the SQL_FLAP_SUP_MIN_TIME and SQL_FLAP_SUP_MAX_COUNT settings gives me a good starting point for understanding how SVXLink currently handles short bursts and squelch flapping. It’s clear that while these settings offer some control, the challenge of users activating the repeater without meaningful transmission persists, hence my interest in exploring the integration of voice activity detection (VAD).

The idea behind leveraging libfvad is to add another layer of filtering that specifically targets the issue of ‘kerchunking’ by identifying whether an activated transmission actually contains voice. This could complement the existing configurations by providing a more nuanced approach to handling transmissions, potentially reducing the frequency of announcements about unannounced transmissions and improving the overall user experience.

I appreciate your insights into the potential annoyances of frequent repeater announcements. As I work on integrating libfvad, I’ll be mindful of the balance between effectively managing misuse and maintaining a positive user experience. I’ll also take into consideration your advice on examining the man svxlink.conf sections closely as I develop this feature.

Thank you again for your guidance. I’m hopeful that by combining our efforts and insights, we can enhance SVXLink’s functionality and make it even more robust against unintentional activations while keeping it welcoming for all users.

73, Silviu

s1lviu avatar Feb 01 '24 20:02 s1lviu

I wanted to share my thoughts on Silviu's suggestion regarding identifying voice transmission rather than an unmodulated carrier. To put it simply, it's like having a DTMF decoder but for real speech, which sounds pretty intriguing.

The existing methods to prevent misuse by ham operators seem rather rudimentary, relying mainly on timers. While they do the job to some extent, they aren't foolproof against someone with malicious intent who can bypass them easily.

I'm fully in favor of this feature request as it opens up numerous development opportunities and possibilities for new integrations.

Razvan / YO6NAM

yo6nam avatar Feb 01 '24 21:02 yo6nam

The current "flapping squelch suppression" is not targeted at suppressing misuse but rather at QRM suppression. Implementing features to stop misuse is almost always a waste of time because the people that is misusing the system on purpose will find a way around the countermeasures. They probably will see it as a challenge to overcome.

In the case of adding VAD, you will probably instead get people transmitting nonsense speech instead of just a carrier. So then the problem has actually gotten worse.

sm0svx avatar Feb 25 '24 16:02 sm0svx

Thank you, @sm0svx, for your perspective on the challenges of implementing countermeasures against misuse.

I understand your concerns about potential adaptability by those intent on misusing the system. However, I believe integrating VAD could offer benefits in improving system efficiency and the user experience for legitimate operators, aside from just attempting to deter misuse.

Given the complexity of system management and misuse prevention, I propose a trial implementation of VAD to assess its practical benefits and challenges. This could help us gather data on its impact and refine our approach based on real-world usage — additionally, leveraging RoLink network, which boasts over 100 active nodes, presents a fantastic opportunity for real-world testing of this approach.

I have started experimenting with libfvad - which I managed to implement in the project, but I am encountering issues where it does not recognize voice effectively within the audio processing pipeline. Specifically, I’ve been trying to integrate it in Reflector.cpp and ReflectorClient.cpp, but with limited success (it does detect voice every time).

Any advice or insights into my current implementation challenges would also be greatly appreciated.

Thank you for considering this approach. I’m looking forward to your suggestions and any further guidance you could provide.

73, Silviu

s1lviu avatar Feb 25 '24 17:02 s1lviu

In the case of adding VAD, you will probably instead get people transmitting nonsense speech instead of just a carrier. So then the problem has actually gotten worse.

In a large network environment, kerchunking often becomes nearly unavoidable without implementing certain variable customizations such as delays. However, as the network expands significantly, administrators face challenges in controlling each node or repeater individually.

Enhancing the capabilities of the reflector could greatly enhance the overall reception experience across the network. For instance, when someone verifies the status of a repeater for connectivity, it triggers the activation of all other nodes. With this occurring frequently, say 10 times per hour, it can become increasingly burdensome and may lead to the desire to deactivate the radio altogether.

Therefore, why not explore the potential of integrating the VAD library into our system? Merely assuming the actions of others won't propel the svxlink project forward.

yo6nam avatar Feb 25 '24 18:02 yo6nam

Hey, @sm0svx, Is it possible at least to tell me in what format and encoding the audio data contained in msg.audioData() arrives here? https://github.com/sm0svx/svxlink/blob/cde00792406352a8df3425177945d81fd0e874c6/src/svxlink/reflector/Reflector.cpp#L442 Many thanks!

s1lviu avatar Mar 06 '24 12:03 s1lviu

The audioData method returns a byte array. https://github.com/sm0svx/svxlink/blob/cde00792406352a8df3425177945d81fd0e874c6/src/svxlink/reflector/ReflectorMsg.h#L1153 The array can contain any encoding that SvxLink support. Right now it's implemented so that the reflector tell all nodes what encoding they should use. Most reflectors will use Opus since that is the default. https://github.com/sm0svx/svxlink/blob/cde00792406352a8df3425177945d81fd0e874c6/src/svxlink/reflector/ReflectorClient.cpp#L177-L204

sm0svx avatar Mar 11 '24 21:03 sm0svx

The audioData method returns a byte array.

https://github.com/sm0svx/svxlink/blob/cde00792406352a8df3425177945d81fd0e874c6/src/svxlink/reflector/ReflectorMsg.h#L1153

The array can contain any encoding that SvxLink support. Right now it's implemented so that the reflector tell all nodes what encoding they should use. Most reflectors will use Opus since that is the default. https://github.com/sm0svx/svxlink/blob/cde00792406352a8df3425177945d81fd0e874c6/src/svxlink/reflector/ReflectorClient.cpp#L177-L204

Great, @sm0svx. Thanks for the clarification, also for this insight

the reflector tell all nodes what encoding they should use

Later edit: For anyone interested in this feature, I finally managed to implement it here.

s1lviu avatar Mar 12 '24 07:03 s1lviu