audio icon indicating copy to clipboard operation
audio copied to clipboard

SQUIM running in real-time

Open balkce opened this issue 9 months ago • 0 comments

I applied SQUIM to assess speech quality as a way to correct the direction-of-arrival of a location-based speech enhancement system. More info here.

I'm feeding the last 3-second window of the input to SQUIM, every 0.1 seconds. It is able to respond in less than that time: it featured a maximum response time of 0.0704 seconds. Thus, in terms of response time, SQUIM seems to be able to run in real-time.

However, it does seem to struggle in providing a constant speech quality assessment throughout. I'm using the SI-SDR metric from the objective model. With the a speech recording with no enhancement or spatial variation carried out, the ideal behavior would be that SQUIM provided the same SI-SDR measurement through time, but, as it can be seen in Figure 2 of the aforementioned paper, it does not. It varies wildly, which required some smoothing to work well with the rest of the system.

So here are my questions:

  • Is it possible to modify SQUIM for this type of real-time application? I'm assuming it would need some sort of causalness built into it. Or not? I was actually impressed it was able to provide a workable result without any modification. Maybe a fine-tuning would be enough?
  • If so, what are the steps you would reccomend that I partake in fine-tuning SQUIM? I've taken a look at this paper that @nateanl provided to another user inquired about it (in #3424), but it is still not clear to me how I should proceed.
  • Is SQUIM the best alternative for this? I've looked at other techniques for non-reference speech quality assessment, and it seems SQUIM is up there with the best of them for offline applications. But for real-time scenarios, I'm not sure.

Thank you in advance for any help/guidance you can provide. I'm open to help out in any way, if need be, to make SQUIM work better in real-time applications.

balkce avatar Jan 09 '25 19:01 balkce