Dragonfire Dragonfire gets puzzled with its own female voice.

Dragonfire gets puzzled with its own female voice.

Open ProphetDaniel opened this issue 6 years ago • 5 comments

Dragonfire listens to the audio itself is reproducing and assumes that's the user's voice. And it can enter an endless loop in that situation.

A simple way to solve that problem is to use a noise cancelling technique of the generated sounds by the computer so that the detected sound is only going to be the difference of the ambient sound less the generated sound.

As the volume and the microphone position vary a lot, a scaling factor might suffice for the noise cancellation scheme. Say input_sound = microphone_input - scaler.generated_computer_sound.volume. The scaler can be estimated whenever Dragonfire says Good Evening Sir for example.

Later on Dragonfire can utilize only input_sound as source for processing instead of the raw microphone input.

Jul 03 '17 02:07 ProphetDaniel

@ProphetDaniel I also experienced the same problem many times. You can simply workaround this problem either by listening from your headphones or putting your microphone farther away from your speaker(or reduce the volume).

I don't understand this suggestion of yours: input_sound = microphone_input - scaler.generated_computer_sound.volume please try to explain in a different way.

We can configure Dragonfire to not listen while speaking but there is a command ENOUGH / SHUT UP to silence her because Dragonfire reads the whole Wikipedia article when you call (SEARCH|FIND) * (IN|ON|AT|USING) WIKIPEDA. We can not cancel this feature because it's an important feature for blind users.

Jul 03 '17 10:07 mertyildiran

You can simply workaround this problem either by listening from your headphones or putting your microphone farther away from your speaker(or reduce the loudness).

@mertyildiran , that is exactly what I did. I used my headphones to work around.

We can configure Dragonfire to not listen while speaking

I don't think Dragonfire should be not listening while speaking. Because we humans are still listening while we are speaking too.

I don't understand this suggestion of yours: input_sound = microphone_input - scaler.generated_computer_sound.loudness please try to explain in a different way.

This is a possible noise cancelation technique based on the knowledge Dragonfire has about the sound it is generating. So to cancel completely its own voice in the signal to be processed by the voice recognition library inside Dragonfire and still be listening what the user has to say:

Applying the superposition principle of sound waves it is then possible to deduct the estimatedMicrophoneNoise (Dragonfire's voice upon microphone) from the microphoneSignal to obtain noiseFreeUserInput.

noiseFreeUserInput = microphoneSignal - estimatedMicrophoneNoise.

A sound wave is generated by the computer speaker and navigates through the air and computer case until the microphone is reached. The speakerWave and the currentLoudness are the available information that represent very well that wave that will be sensed back by the microphone (polluting what the user could be speaking at the same time). With that information it is possible to estimate the additional sound wave estimatedMicrophoneNoise that strikes the microphone and cheats Dragonfire user's voice processing. Ignoring the time it takes for the sound wave to travel from the speaker to the microphone:

estimatedMicrophoneNoise ~ speakerWave.currentLoudness

Depending on accoustic characteristics combined with microphone and speaker properties, the speaker sound will be sensed stronger or weaker at the microphone. So we will add a scaler feedbackScaler.

estimatedMicrophoneNoise = speakerWave.currentLoudness.feedbackScaler

The feedbackScaler can be estimated initially when Dragonfire greets the user as the estimatedMicrophoneNoise will be pretty close to the speakerWave if the user remains quiet.

Hopefully there will be no need to account for the travel time of the sound wave from speaker to microphone. If that is ever needed, it is desirable to account for a single speaker as generating the sound since the microphone can be closer to one or another speaker. And still ignoring the fact that sounds travels faster through the computer case than through the air.

estimatedMicrophoneNoise(t) = speakerWave(t-delay).currentLoudness.feedbackScaler

Considering sound travels faster through case:

estimatedMicrophoneNoise(t) = [speakerWave(t-caseDelay).feedbackScalerCase + speakerWave(t-airDelay).feedbackScalerAir].currentLoudness

Considering stereo playback:

estimatedMicrophoneNoise(t) = [speakerWaveLeft(t-caseDelayLeft).feedbackScalerCaseLeft + speakerWaveLeft(t-airDelayLeft).feedbackScalerAirLeft+speakerWaveRight(t-caseDelayRight).feedbackScalerCaseRight + speakerWaveRight(t-airDelayRight).feedbackScalerAirRight].currentLoudness

Jul 03 '17 15:07 ProphetDaniel

@ProphetDaniel to be honest, I don't know the low level mechanisms of speech recognition we are using in Dragonfire. So I don't know how we can implement those things you explained but let's continue to talk in our Gitter chat room.

Jul 03 '17 15:07 mertyildiran

If found this strategy and I believe it has great merit for noise cancellation. It uses a multiple neuron adaptive filter. I will make some tests to check the performance of it.

Jul 20 '17 00:07 ProphetDaniel

@ProphetDaniel as I said before, it has a low priority also the prefered Speech Recognition method is currently uncertain. Maybe we will migrate to mozilla/DeepSpeech soon. We should talk on Gitter, please tag me when you have some free time. :blush:

Jul 20 '17 15:07 mertyildiran

Dragonfire Dragonfire copied to clipboard

Dragonfire gets puzzled with its own female voice.

Dragonfire
Dragonfire copied to clipboard