VocalMat Time Jump (missing all the USV's during that time)

Hi,

First of all, very nice work!

I am currently trying your VocalMat pipeline and I realized something that I found quite surprising. I was comparing the VocalMat detections in one of my 4min recordings with a DeepSqueak detection that I had corrected (so it was my "ground truth") and I realized that at somepoint in the frames from VocalMet there was a jump from 89sec to 92sec, with various and very obvious spectrograms not being detected at all. It didn't make sense because VocalMat was able to detect spectrograms with similar caracteristics in the previous frames and the signal for these USV's is quite good. I was wondering if you have any idea of why this problem may occur. I tried changing the overlaping of the samplings (set to 5sec originally) and the time (set to 1min originally) but the problem remained. FYI: I tried Matlab 2019a and 2023b

Thank you very much for your help.

Dec 07 '23 16:12 Cerilam

Sorry for the delay! This notification got buried under many other emails over the break.

My initial thought would be that the USVs happened too close to the beginning of the 1min segment and, therefore, the model would still be trying to learn the differences between background and foreground (ie, USVs). However, I would expect that the overlap between neighboring segments would catch and correct these cases.

If you are still experiencing problems, would you mind sharing this recording in which you are having problems? I can try to take a look and see if I can understand the problem.

Jan 10 '24 14:01 ahof1704

Hello, Thank you for this reply. We wouId realy like to use your tool so if we could find some solution for that it would be great. I actually played a bit with the overlapping as well as the size of the segments but it didn't produce any effect on the accuracy of detections, and I also lowerded the min frequency threshold, but it also didn't solve it. I then realised that this was happening for frequencies below 60 KHz more or less and in our recordings we have some static and constant noise (in all of our recordings) that is in that frequency range (around 50kHZ +/- 10 kHz). So my guess now is that this noise, that is constant and lasts during the whole recordings, messes with the median filter and that this might be causing this issue... I hope we can work out some kind of solution. I would be glad to send you a couple of recordings. We also send you an email regarding this problem, with some pictures to illustrate this. Thank you for the help.

Jan 11 '24 14:01 Cerilam

I see. Yes, the constant noise can be a problem since the background noise is computed for the whole frequency range within a time window instead of band-wise. Thus, it is likely that the noise is being detected as a "very long USV" leading to the loss of the individual USVs that overlap with such noise. We played with potential solutions for this type of problem, but we later realized that it was easier to control the experimental setup (ie, use better acoustic insulation) than trying to handle all types of noises.

Have you checked the length of the detected USVs? Do you see any detected USVs that are unusually long?

In any case, happy to take a look at the sample if you want.

Jan 11 '24 14:01 ahof1704

Hello, To follow this post (a team member posted it). So it doens't detect an anormally long USV, in fact this noise we are refering to is really constant and is never detected by VocalMat, but neither the vocalizations that overlap it. I think you will get a better understanding with this images. I will soon send you a sample so that you can have a better idea. As for the experimental set up, the experiments are done and we cannot re-do them, and despite it the recordings are still quite clear, so I am hopefull that we could find some kind of workaround. So here I will post some images with cutted usv because they step into the "noise band" but keep in mind that when they are fully in the "noise band" they are not detected at all (hence the 'time jumps'). Thank you!

Jan 11 '24 14:01 jess255

Hi again, Here is a link to download the audio file (it corresponds to the images).

https://filesender.renater.fr/?s=download&token=4fd767b6-a6e1-49d9-9b57-0a37ad84b577

Jan 11 '24 14:01 jess255

Hi @jess255, I don't mean to add noise to this convo when @ahof1704 is trying to help you, but could I ask if you've tried any of the following?

[ ] just subtract off a DC offset to see if it helps with noise, something like 'audio - mean(audio)'
[ ] try something a little more heavy-duty, like https://github.com/timsainb/noisereduce/

Just chiming in because I sympathize greatly with "we already ran the experiments and can't go back :smiling_face_with_tear: "

Jan 11 '24 14:01 NickleDave

Hello @NickleDave The first one we dind't try, I will give it a go, but I wonder if it would not mess with the detection or rule out some less intense events, which could lead us to also miss events... As for the second one I did try that package but the denoising produces an uneven background and when I feed the transformed wav. file to VocalMat it doesnt detect anything and crashes... Thank you for kindly offering yout help though!

Jan 11 '24 15:01 jess255

Hi @ahof1704 , Sorry for bothering you once again. I was just wondering if you had the time to check the audio, I was not sure you saw the link since I forgot to tag you. It is just to know if you have any idea of a possible solution or if there's nothing we can do. Let me know. Thank you very much!

Hi again, Here is a link to download the audio file (it corresponds to the images).

https://filesender.renater.fr/?s=download&token=4fd767b6-a6e1-49d9-9b57-0a37ad84b577

Jan 17 '24 16:01 jess255

Sorry, I haven't had time yet. I will do my best to look into it this week. I will make sure to keep you posted.

Regarding the images you shared, I can see the noise in the first image (it is shown as an almost vertical column of noise, which I believe is what you mean by noise band). We have associated this type of noise with echo during recording or saturation of the microphone. Unfortunately, cases like this are hard to solve and motivated the redesign of our recording chambers to include anechoic material on the walls and moving the microphone further away from the animals.

I am confused by the second image though. I can see that some noise was detected for a very high frequency, but the USV was detected for most of it. Is this detection of high-frequency noise a big issue?

Jan 17 '24 16:01 ahof1704

Hello @ahof1704 Thank you for the reply! The noise I am refeering too is not the vertical band, but rather an horizontal one around 50 kHz. You Can see that around that frequency the background is slightly brighter, and I believe this is causing the issue. The high freq. noise has not been problematic as far as I checked, usually VocalMat still performs well despite that.

Jan 17 '24 17:01 jess255

I see. Ok, I will ask some friends to use their Avisoft license to visualize your sample better. Maybe we can play with the way how the foreground/background threshold is being computed. I'll let you know how it goes.

Jan 17 '24 18:01 ahof1704

Hi @ahof1704 , Sorry for bothering you once again. I was just wondering if you had the time to check the audio, I was not sure you saw the link since I forgot to tag you. It is just to know if you have any idea of a possible solution or if there's nothing we can do. Let me know. Thank you very much!

Hi again, Here is a link to download the audio file (it corresponds to the images). https://filesender.renater.fr/?s=download&token=4fd767b6-a6e1-49d9-9b57-0a37ad84b577

hi @jess255, jumping in to try and help. Is this audio the one you mentioned in your original post that should have vocals in the 89-92s range but VocalMat is not detecting them? I checked the audio and I don't see any vocals in that range. If it is the correct audio, can you share the timestamp of the vocals you expect to be detected but were not. Thanks

Jan 17 '24 19:01 gumadeiras

Hello @gumadeiras ! Sorry its not the same one, I tried several recordings and thats how I realized that there was a missdetection. I have to go through my notes in the lab tomorrow to find the one I mentioned in the first post. I can also share it with you once I find it ! Thanks a lot for the help !

Jan 17 '24 19:01 jess255

Hello @gumadeiras @ahof1704 Here's the file I first mentionned and some pictures of the events I was refeering too. https://filesender.renater.fr/?s=download&token=d86f2b2f-d631-4ff7-b74a-699323a24ba0 107 108 109 105 106 Thank you for the help !

Jan 18 '24 10:01 jess255

hi @jess255 sorry for the huge delay on this. I did try a bunch of narrow-band filters to try and remove the background noise in the ~49kHz range and other ranges too, and also tried removing the broadband noise around the vocals but to no avail. This will require some deeper digging to understand why the segmentation is missing al these vocals. I suspect there is some anomaly in the computed power spectral density causing the contrast normalization/thresholding step to throw out these vocal segments

Apr 01 '24 19:04 gumadeiras

Hello @gumadeiras , thanks for trying to solve it! I was taking a look at the code, and well to be honest I don't have the skills to fully understand it but something caught my attention, I noticed there was some kind of threshold (see the picture attached). Could changing this threshold, for instance lowering it, solve the problem, or it's totally unrelated? Thank you again for looking for solutions. Best,

Jessica

Apr 02 '24 11:04 jess255

VocalMat VocalMat copied to clipboard

Time Jump (missing all the USV's during that time)

VocalMat
VocalMat copied to clipboard