mlx
mlx copied to clipboard
[Feature] Support Short-Time Fourier Transforms
Short-Time Fourier Transforms (STFT) and its inverse are used extensively in signal processing.
Function definition will match scipy: https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.stft.html https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.istft.html
You can assign this to me.
Cool.. if we include this, I wonder where it would go? I'm not opposed to including it in core or a sub-package but I think there is a decision to be made there.
I see jax implemented it in a scipy.signal
subpackage https://jax.readthedocs.io/en/latest/_autosummary/jax.scipy.signal.stft.html, and PyTorch implemented it at the top level https://pytorch.org/docs/stable/generated/torch.stft.html
Also our Whisper implementation already has an STFT in MLX you might find useful: https://github.com/ml-explore/mlx-examples/blob/main/whisper/whisper/audio.py#L104-L127
I think adding them in core is best - they’re fundamental algorithms and used frequently. A separate signal
sub-package makes sense if you anticipate adding more functions from scipy.signal
in he near future. (https://jax.readthedocs.io/en/latest/jax.scipy.html#module-jax.scipy.signal)
Thanks for pointing me to the Whisper implementation, I haven't seen in yet
One tiny issue about stft
in mlx-example/whisper, the noverlap
parameter does not actually represent the overlap of frames; instead, it is semantically equivalent to hop_length
in torch.stft
.
I found it a little bit misleading while implementing istft
.
Is it not the same as noverlap
in the Scipy implementation? https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.stft.html
I think that's a good reference to follow.
Still working on this CL, shouldn't be much longer.
t = (x.size - nperseg + noverlap) // noverlap
does look wrong though, maybe that's where the confusion is. The denominator should be the number of steps.
Is it not the same as
noverlap
in the Scipy implementation? https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.stft.htmlI think that's a good reference to follow.
No.
According to this hop_length = nperseg - noverlap
In stft in mlx-example/whisper noverlap
is being used as hop_length
. It appears to be a naming mistake, the logic looks good to me.
Still working on this CL, shouldn't be much longer.
Cool.
I was implementing a istft
specific to my problem, not a general one.