mlx icon indicating copy to clipboard operation
mlx copied to clipboard

[Feature] Support Short-Time Fourier Transforms

Open Rifur13 opened this issue 10 months ago • 8 comments

Short-Time Fourier Transforms (STFT) and its inverse are used extensively in signal processing.

Function definition will match scipy: https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.stft.html https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.istft.html

Rifur13 avatar Apr 18 '24 01:04 Rifur13

You can assign this to me.

Rifur13 avatar Apr 18 '24 01:04 Rifur13

Cool.. if we include this, I wonder where it would go? I'm not opposed to including it in core or a sub-package but I think there is a decision to be made there.

I see jax implemented it in a scipy.signal subpackage https://jax.readthedocs.io/en/latest/_autosummary/jax.scipy.signal.stft.html, and PyTorch implemented it at the top level https://pytorch.org/docs/stable/generated/torch.stft.html

Also our Whisper implementation already has an STFT in MLX you might find useful: https://github.com/ml-explore/mlx-examples/blob/main/whisper/whisper/audio.py#L104-L127

awni avatar Apr 18 '24 02:04 awni

I think adding them in core is best - they’re fundamental algorithms and used frequently. A separate signal sub-package makes sense if you anticipate adding more functions from scipy.signal in he near future. (https://jax.readthedocs.io/en/latest/jax.scipy.html#module-jax.scipy.signal)

Thanks for pointing me to the Whisper implementation, I haven't seen in yet

Rifur13 avatar Apr 18 '24 03:04 Rifur13

One tiny issue about stft in mlx-example/whisper, the noverlap parameter does not actually represent the overlap of frames; instead, it is semantically equivalent to hop_length in torch.stft.

I found it a little bit misleading while implementing istft.

tqtifnypmb avatar Apr 30 '24 09:04 tqtifnypmb

Is it not the same as noverlap in the Scipy implementation? https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.stft.html

I think that's a good reference to follow.

awni avatar Apr 30 '24 13:04 awni

Still working on this CL, shouldn't be much longer.

t = (x.size - nperseg + noverlap) // noverlap does look wrong though, maybe that's where the confusion is. The denominator should be the number of steps.

Rifur13 avatar Apr 30 '24 14:04 Rifur13

Is it not the same as noverlap in the Scipy implementation? https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.stft.html

I think that's a good reference to follow.

No.

According to this hop_length = nperseg - noverlap

In stft in mlx-example/whisper noverlap is being used as hop_length. It appears to be a naming mistake, the logic looks good to me.

tqtifnypmb avatar Apr 30 '24 15:04 tqtifnypmb

Still working on this CL, shouldn't be much longer.

Cool.

I was implementing a istft specific to my problem, not a general one.

tqtifnypmb avatar Apr 30 '24 15:04 tqtifnypmb