odas icon indicating copy to clipboard operation
odas copied to clipboard

SSS, sources not very distinct in separated.raw.

Open ShawnPinchbeck opened this issue 3 years ago • 10 comments

I am testing ODAS for possible use in a computer interactive art piece to isolate participants and performers giving voice commands from the ambient noise/music of the active space.

In my testing, I have the ReSpeaker 4 mic array running on a Raspberry Pi 4 with speakers 2m away from the mic on opposite sides. I have Odas_web running on my Mac. I play voice from one speaker and ambient sounds or music from the other speaker at equal volumes. The two sources are tracked well and the angle of their source consistent with the speaker locations, if I move them around.

The issue is that when I listen to the separated.raw file, I hear very little separation of the two sources in the audio tracks, perhaps 2 dB of difference. It is not enough isolation to allow voice recognition to work. I have tried adjusting settings and parameters as mentioned here, but have not improved the isolation. How much separation is possible? Is my test flawed and the sounds I am playing or the way I have speakers setup not optimal for Odas? Is the ReSpeaker performing too poorly to function in the way I imagine it should and a better mic array would perform better? Should I be changing the angle of directivity? I'm not sure what the issue is. The sst and ssl are working perfectly.

Any suggestions or insights would be appreciated. Odas should do exactly what we need, if I can figure out how to improve the dB of separation.

Thanks! Shawn

ShawnPinchbeck avatar Jun 30 '21 17:06 ShawnPinchbeck

Have you tried with posfiltered.raw? I found the better separation level in this file rather than seprarated.raw (?), btw, can we deliberately set the number of sources and how it is defined?

Quang-Kien avatar Jul 04 '21 04:07 Quang-Kien

Thanks for the reply! The postfiltered.raw has too many artifacts and isn't useful for voice recognition.

I'm not sure if you can change the number of sources. You can change the sensitivity of tracking, duration of tracking when objects are not making sound and you can adjust the angle of sensitivity to block out erroneous noise from directions you don't want to detect.

I'm wondering if the separation issue is related to the ReSpeaker's quality and number of microphones? I'm not sure how much of a factor this is. I don't have another mic to test this.

ShawnPinchbeck avatar Jul 05 '21 19:07 ShawnPinchbeck

Hi there,

This mainly depends on the room acoustic. In some cases the GSS module can do a decent job, but sometimes it gets more difficult. You are right: the post-filtered version should not be used with a ASR system as it introduces some distortion.

We are currently working on PyODAS, which will be in Python, and will include some more recent DL-based methods to boost separation results. Stay tuned :)

FrancoisGrondin avatar Jul 05 '21 21:07 FrancoisGrondin

Hi Francois, I was wondering if room acoustic was a factor in the sss. I'm testing in a pretty small area. I'll have to try it out in our studio to see what the difference is.

I'm definitely tuned in! :-) What is your timeline for PyODAS release?

Cheers!

ShawnPinchbeck avatar Jul 08 '21 16:07 ShawnPinchbeck

Hopefully by the end of fall 2021 :)

FrancoisGrondin avatar Jul 09 '21 14:07 FrancoisGrondin

Hi Francois, I was wondering if room acoustic was a factor in the sss. I'm testing in a pretty small area. I'll have to try it out in our studio to see what the difference is.

I'm definitely tuned in! :-) What is your timeline for PyODAS release?

Cheers!

Please share your test results, Indeed the postfiltered is not recognized well though the playback showing the clear separation. On the other hand, the separated version exhibit a strong mixing impact and is no used for speech recognition as well

Quang-Kien avatar Jul 10 '21 09:07 Quang-Kien

Hopefully by the end of fall 2021 :)

Great to hear, should be a python wrapper?

By the way, I found that the output raw always composes 4 channels, in our case we just want o separate two sources, is any way to set the number of sources at least for output?

Any please share with us a more detailed explanation of the parameters in the cfg file if you have any, your help is appreciated.

BRs

Quang-Kien avatar Jul 10 '21 09:07 Quang-Kien

I'm still unclear, what does the SSS data do? I know it's related to beamforming but does it provide the unit vector location of the sound source in 3D space?

fanman2014 avatar Sep 10 '21 03:09 fanman2014

PyODAS would be amazing have you got any updates Francois?

StuartIanNaylor avatar Feb 22 '22 16:02 StuartIanNaylor

@FrancoisGrondin is PyODAS the same project as SpeechBrain? I discovered your name under that project and noticed it overlaps with some of the work on ODAS.

atyshka avatar Nov 30 '22 19:11 atyshka