nsgt icon indicating copy to clipboard operation
nsgt copied to clipboard

transform mapping

Open falseywinchnet opened this issue 1 year ago • 0 comments

@grrrr

Hi, my github: https://github.com/falseywinchnet/streamcleaner

So, recently, i was reading about gabor filters and learned of your excellent constant-q gabor transform.

Not sure I am using it right, but, ie, from nsgt import NSGT, LogScale, LinScale, MelScale, OctScale, SndReader rate = 48000 test = data[0:rate] scl = OctScale(60.41,22050, 48)

nsgt = NSGT(scl, fs=rate/2, Ls=rate , real=True, matrixform=True, reducedform=0)

# forward transform 

c = nsgt.forward(test) s_r = nsgt.backward(c)

Complains about something to do with q factor too high?

/usr/local/lib/python3.9/dist-packages/nsgt/nsgfwin_sl.py:64: UserWarning: Q-factor too high for frequencies 60.41,61.29,62.18,63.08,64.00,64.93,65.87,66.83,67.80,68.78,69.78,70.80,71.83,72.87,73.93,75.00,76.09,77.20,78.32,79.46,80.61,81.78,82.97,84.18,85.40,86.64,87.90,89.18,90.47,91.79,93.12,94.48,95.85,97.24,98.65,100.09,101.54,103.02,104.51,106.03,107.57,109.14,110.72,112.33,113.96,115.62,117.30,119.00,120.73,122.49,124.26,126.07,127.90,129.76,131.65,133.56,135.50,137.47

but seems to reconstruct ok: 4.298e-16 please advise settings for speech, with an emphasis on SSB(<4khz).

Something I noted that I was meaning to ask about is what i will call, because i dont really have a better term, "transform mapping".

That is to say, different transforms which are all reversible will have different certainty in terms of time and frequency, such that something you do to one will show up in the other.

For example, in the STFT conventional form, ie, NFFT=512, hop=128, a typical speech waveform will have harmonics buried in the noise which are not distinguishable from the noise.

However, if the audio is first transformed into the gabor representation as described, and then thresholded below the lowest harmonic(using a statistical approach I developed called atd), and then this is reversed and again transformed into the stft, these tertiary harmonics now dominate the residual structure of the corresponding regions of the specgram, making them visible.

SO, each bin in one representation, or regions of bins, maps (in a complex manner, because you have multiple convolutional steps involved) to a bin, or region of bins, in another representation, and as such, if you, for each bin in the stft, can identify a set of corresponding gabor bins, and identify the maximum value out of all the bins, and map this to the corresponding stft bin, additional structure and dimensionality not typically apparent will manifest in the data set due to the emphasis applied(and as such may be useful for masking).

This, in turn, could it allow the use of perfect reconstruction(in the stft domain) along with additional alternative transforms to better signify the time-frequency localization of energy forms which are buried(convolutionally distributed) in the stft form?

I am presently examining this scenario, but I would appreciate a little insight on how you use nsgt best for speech.

additionally, i noted that when i applied my time domain masking method(called fast_entropy) to the gabor short term transform, and then inverted the remaining bins, and once again applied STFT, due to the complex reconstructive properties, some of the dominant frequency components(the speech) which had been masked(in the time domain), had been reconstructed into the waveform by the inversing, such that the final product was improved further.

I will have to do more research on this.

I read somewhere that the gabor transform uses a guassian function. I have developed an interesting alternative function which is not suitable for reconstruction, but which corresponds to a maximal energy localization and minimal distortion, in the complex domain, which you might use, ie, to generate a mask, of the same dimensions, to then apply to the representation generated with a reversible window(and perhaps using a synthesis window also). This window is a double inverted logit window, the code for which is :

https://github.com/falseywinchnet/streamcleaner/blob/master/realtime_interactive.py#L102

I would like to know if i can combine this with the gabor transform.

it seems that the gabor transform uses the https://github.com/grrrr/nsgt/blob/master/nsgt/nsgfwin.py hann window, but does not yet possibly? make use of improved reconstruction inverting windows: https://pyroomacoustics.readthedocs.io/en/pypi-release/pyroomacoustics.transform.stft.html#pyroomacoustics.transform.stft.compute_synthesis_window

see the logic here

Also, does the gabor transform suffer from frequency instability as mentioned in

https://dsp.stackexchange.com/questions/72588/synchrosqueezed-stft-phase-transform/72590#72590

?

in terms of the practical ramifications or modifications necessary, dft cisoid centering is applied simply by

https://github.com/falseywinchnet/streamcleaner/blob/master/realtime.py#L350

padding the input, then basically stacking each segment half backwards, then windowing with an ifftshifted window-

and then for the inverse applying fftshift on each segment.

best regards

falseywinchnet avatar Mar 10 '23 17:03 falseywinchnet