Add istft operation
Close #2016
This code adds ISTFT op
It was a bit challenging to go through it because I needed to learn available mil operations and adapt the process.
I based my self on this PR as suggested by junepiz
There are still many things I'm not confident with and I would like some help if possible:
- Understanding notation used in MLOps
- Proper testing of the function
- Why does
STFTdoesn't usecenter. The same would apply toISTFT - Check if the
for loopin the OLA (overlap-add) function is fine or I need to use some special computation
Thank you for your contributions! Left some comments.
Maybe it's because that this PR is still in draft stage (I noticed the
Draftin PR title), but just want to mention that there are some undefined vars which triggers flake8 checking error: https://gitlab.com/coremltools1/coremltools/-/jobs/5539997442.
This should be fixed now.
I still don't understand how you handle the center parameter in the STFT implementation
We should also handle it in the ISTFT in a similar manner.
I still don't understand how you handle the
centerparameter in the STFT implementationWe should also handle it in the ISTFT in a similar manner.
@alealv That's a really good question.
As shown in STFT PyTorch doc, the center is only for padding the input. The center parameter is handled by PyTorch ops lowering, which means coremltools does not even see this center parameter when it looks at the Torch IR graph.
More specifically, when we lower the STFT op in STFT lowering function, the input_data already reflects the center parameter, so the center parameter is not used in mb.complex_stft. To better understand it, feel free to play with the test_stft test case in coremltools/converters/mil/frontend/torch/test/test_torch_ops.py: When input_shape is (1, 32) and center=False, the input_data in STFT lowering function is still (1, 32). However, when center is True, the input_data in STFT lowering function will become (1, 48).
I finally fixed all issues and tests pass. The only result I found weird is here
elif length and return_complex:
with pytest.raises(ValueError, match="New var type `<class 'coremltools.converters.mil.mil.types.type_tensor.tensor.<locals>.tensor'>` not a subtype of existing var type `<class 'coremltools.converters.mil.mil.types.type_tensor.tensor.<locals>.tensor'>`"):
TorchBaseTest.run_compare_torch(
input_shape,
ISTFTModel(),
backend=backend,
compute_unit=compute_unit
)
@alealv - Your branch is a couple months old at this point. Please rebase your changes on the current tip of main. You probably want to squash all your changes first. Once that is done, we can kick off a CI run.
@junpeiz - please re-review these changes.
@alealv - Your branch is a couple months old at this point. Please rebase your changes on the current tip of
main. You probably want to squash all your changes first. Once that is done, we can kick off a CI run.@junpeiz - please re-review these changes.
Done
@alealv - did you forget to git push? I'm still seeing the first commit in your branch that is not from you was from November 14th.
@alealv - did you forget to
git push? I'm still seeing the first commit in your branch that is not from you was from November 14th.
last commit is from 1 hour ago 13176430f36017302fcbb69e7fe4ff614abe0a57
Your most recent commit is not the issue.
The issue is that your first commit (950b1a06bf39e884ac73e0ae69539cb1ebed95b7) is committed on top of a main commit (7a0706281617d6398f2cf7a46f70a46909996710) which is from November 14.
If we were to merge your changes, they would get committed on top of the current tip of main. Your pull request branch is not a good reflection of that, since many changes have been gone into main since November 14.
You basically need to do three things:
1 - Get the most recent changes from upstream main to your fork's main.
2 - Rebase this pull request branch on top of your updated main.
3 - Force push to this branch.
Your most recent commit is not the issue.
The issue is that your first commit (950b1a0) is committed on top of a
maincommit (7a07062) which is from November 14.If we were to merge your changes, they would get committed on top of the current tip of
main. Your pull request branch is not a good reflection of that, since many changes have been gone intomainsince November 14.You basically need to do three things: 1 - Get the most recent changes from upstream
mainto your fork'smain. 2 - Rebase this pull request branch on top of your updatedmain. 3 - Force push to this branch.
Sorry, I was rebasing over origin and not over upstream 😅
This is better. However it's still not right. The most recent commit your branch has from main is now e1111237a43381263c72c78ae722ec2cf44513bb which is still about five weeks old. Perhaps your upstream is set to somewhere other than this repository.
It's due to that the author's repo alealv/coremltools that forked from apple/coremltools hasn't been synced. @alealv Could you sync your repo and do rebase? Then it will move your commits based on the most recent head of apple/coremltools.
I hope now is correct.
For some reason I had gitlab.com:coremltools1/coremltools as upstream instead of https://github.com/apple/coremltools
You're branch looks good now. Thanks.
Here is the CI run: https://gitlab.com/coremltools1/coremltools/-/pipelines/1127443513
You're branch looks good now. Thanks.
Here is the CI run: https://gitlab.com/coremltools1/coremltools/-/pipelines/1127443513
hmmm, CI is failing con functions I didn't touch.
I see now that it uses STFT underneath. I'll check this next week.
Nice! The CI is very close to green. Let me know when you fix it, and I will do another round of review. Thanks!
I've dived deep into the CI error.
- I didn't understand why it wasn't failing on my machine until I found this condition:
def test_spectrogram(self, compute_unit, backend, input_shape, spec, power):
if platform.machine() != "arm64":
pytest.xfail(
"rdar://108001659 ([PyTorch] Torchaudio Spectrogram Failed on Intel Machine)"
)
- Why is this needed?
- On
arm64it only fails whenpower=None
Taking a closer look, if power is None the spectrogram function returns both real and image signals. When power is 1 or 2 then it computes the amplitud or power of the signal. So, I think the problem arise because I miss understood how the DFT matrix was computed. I thought the values were the conjugate of the original but it seems it isn't the case.
My last commit should do the trick. Also, I couldn't find any of the runs of TestSTFT, to verify if those were passing.
Why is this needed?
When the Spectrogram was implemented, there was a quite strange failure on Intel Machine, and the M chip Machine works fine. We marked it as xfail while we are debugging it.
On arm64 it only fails when power=None. Taking a closer look, if power is None the spectrogram function returns both real and image signals. When power is 1 or 2 then it computes the amplitud or power of the signal. So, I think the problem arise because I miss understood how the DFT matrix was computed. I thought the values were the conjugate of the original but it seems it isn't the case.
Great investigations! Thanks!
Also, I couldn't find any of the runs of TestSTFT, to verify if those were passing.
Great catch! It's due to that test is marked by @pytest.mark.slow and got skipped (because STFT tests are too slow). I will run it locally for you.
Also, kicked off another CI with your latest commit: https://gitlab.com/coremltools1/coremltools/-/pipelines/1133907719
@alealv Just want to check if you have run the tests locally on your machine? The pytest coremltools/converters/mil/frontend/torch/test/test_torch_ops.py::TestSTFT::test_istft failed on my local run, with at least following cases failures (I haven't finished a whole run):
- coremltools/converters/mil/frontend/torch/test/test_torch_ops.py:9590 TestSTFT.test_istft[compute_unit=ComputeUnit.CPU_ONLY-backend=('mlprogram', 'fp16')-channels=3-n_fft=16-num_frames=9-hop_length=5-win_length=8-window=None-center=True-normalized=False-onesided=None-length=None-return_complex=False]
- coremltools/converters/mil/frontend/torch/test/test_torch_ops.py:9590 TestSTFT.test_istft[compute_unit=ComputeUnit.CPU_ONLY-backend=('mlprogram', 'fp16')-channels=3-n_fft=16-num_frames=9-hop_length=5-win_length=8-window=None-center=True-normalized=False-onesided=None-length=None-return_complex=True]
- coremltools/converters/mil/frontend/torch/test/test_torch_ops.py:9590 TestSTFT.test_istft[compute_unit=ComputeUnit.CPU_ONLY-backend=('mlprogram', 'fp16')-channels=3-n_fft=16-num_frames=9-hop_length=5-win_length=8-window=None-center=True-normalized=False-onesided=False-length=None-return_complex=False]
- coremltools/converters/mil/frontend/torch/test/test_torch_ops.py:9590 TestSTFT.test_istft[compute_unit=ComputeUnit.CPU_ONLY-backend=('mlprogram', 'fp16')-channels=3-n_fft=16-num_frames=9-hop_length=5-win_length=8-window=None-center=True-normalized=False-onesided=False-length=None-return_complex=True]
- coremltools/converters/mil/frontend/torch/test/test_torch_ops.py:9590 TestSTFT.test_istft[compute_unit=ComputeUnit.CPU_ONLY-backend=('mlprogram', 'fp16')-channels=3-n_fft=16-num_frames=9-hop_length=5-win_length=8-window=None-center=True-normalized=False-onesided=True-length=None-return_complex=False]
- coremltools/converters/mil/frontend/torch/test/test_torch_ops.py:9590 TestSTFT.test_istft[compute_unit=ComputeUnit.CPU_ONLY-backend=('mlprogram', 'fp16')-channels=3-n_fft=16-num_frames=9-hop_length=5-win_length=8-window=None-center=True-normalized=True-onesided=None-length=None-return_complex=False]
- coremltools/converters/mil/frontend/torch/test/test_torch_ops.py:9590 TestSTFT.test_istft[compute_unit=ComputeUnit.CPU_ONLY-backend=('mlprogram', 'fp16')-channels=3-n_fft=16-num_frames=9-hop_length=5-win_length=8-window=None-center=True-normalized=True-onesided=None-length=None-return_complex=True]
- coremltools/converters/mil/frontend/torch/test/test_torch_ops.py:9590 TestSTFT.test_istft[compute_unit=ComputeUnit.CPU_ONLY-backend=('mlprogram', 'fp16')-channels=3-n_fft=16-num_frames=9-hop_length=5-win_length=8-window=None-center=True-normalized=True-onesided=False-length=None-return_complex=False]
- coremltools/converters/mil/frontend/torch/test/test_torch_ops.py:9590 TestSTFT.test_istft[compute_unit=ComputeUnit.CPU_ONLY-backend=('mlprogram', 'fp16')-channels=3-n_fft=16-num_frames=9-hop_length=5-win_length=8-window=None-center=True-normalized=True-onesided=False-length=None-return_complex=True]
To be safer, let me remove those @pytest.mark.slow decorators and re-run CI: https://gitlab.com/coremltools1/coremltools/-/pipelines/1134078191
@alealv Just want to check if you have run the tests locally on your machine? The
pytest coremltools/converters/mil/frontend/torch/test/test_torch_ops.py::TestSTFT::test_istftfailed on my local run, with at least following cases failures (I haven't finished a whole run):- coremltools/converters/mil/frontend/torch/test/test_torch_ops.py:9590 TestSTFT.test_istft[compute_unit=ComputeUnit.CPU_ONLY-backend=('mlprogram', 'fp16')-channels=3-n_fft=16-num_frames=9-hop_length=5-win_length=8-window=None-center=True-normalized=False-onesided=None-length=None-return_complex=False] - coremltools/converters/mil/frontend/torch/test/test_torch_ops.py:9590 TestSTFT.test_istft[compute_unit=ComputeUnit.CPU_ONLY-backend=('mlprogram', 'fp16')-channels=3-n_fft=16-num_frames=9-hop_length=5-win_length=8-window=None-center=True-normalized=False-onesided=None-length=None-return_complex=True] - coremltools/converters/mil/frontend/torch/test/test_torch_ops.py:9590 TestSTFT.test_istft[compute_unit=ComputeUnit.CPU_ONLY-backend=('mlprogram', 'fp16')-channels=3-n_fft=16-num_frames=9-hop_length=5-win_length=8-window=None-center=True-normalized=False-onesided=False-length=None-return_complex=False] - coremltools/converters/mil/frontend/torch/test/test_torch_ops.py:9590 TestSTFT.test_istft[compute_unit=ComputeUnit.CPU_ONLY-backend=('mlprogram', 'fp16')-channels=3-n_fft=16-num_frames=9-hop_length=5-win_length=8-window=None-center=True-normalized=False-onesided=False-length=None-return_complex=True] - coremltools/converters/mil/frontend/torch/test/test_torch_ops.py:9590 TestSTFT.test_istft[compute_unit=ComputeUnit.CPU_ONLY-backend=('mlprogram', 'fp16')-channels=3-n_fft=16-num_frames=9-hop_length=5-win_length=8-window=None-center=True-normalized=False-onesided=True-length=None-return_complex=False] - coremltools/converters/mil/frontend/torch/test/test_torch_ops.py:9590 TestSTFT.test_istft[compute_unit=ComputeUnit.CPU_ONLY-backend=('mlprogram', 'fp16')-channels=3-n_fft=16-num_frames=9-hop_length=5-win_length=8-window=None-center=True-normalized=True-onesided=None-length=None-return_complex=False] - coremltools/converters/mil/frontend/torch/test/test_torch_ops.py:9590 TestSTFT.test_istft[compute_unit=ComputeUnit.CPU_ONLY-backend=('mlprogram', 'fp16')-channels=3-n_fft=16-num_frames=9-hop_length=5-win_length=8-window=None-center=True-normalized=True-onesided=None-length=None-return_complex=True] - coremltools/converters/mil/frontend/torch/test/test_torch_ops.py:9590 TestSTFT.test_istft[compute_unit=ComputeUnit.CPU_ONLY-backend=('mlprogram', 'fp16')-channels=3-n_fft=16-num_frames=9-hop_length=5-win_length=8-window=None-center=True-normalized=True-onesided=False-length=None-return_complex=False] - coremltools/converters/mil/frontend/torch/test/test_torch_ops.py:9590 TestSTFT.test_istft[compute_unit=ComputeUnit.CPU_ONLY-backend=('mlprogram', 'fp16')-channels=3-n_fft=16-num_frames=9-hop_length=5-win_length=8-window=None-center=True-normalized=True-onesided=False-length=None-return_complex=True]
Yes, all test pass in my Intel machine
Results (28832.46s (8:00:32)):
21240 passed
11160 skipped
That's for the whole TestSTFT
Yes, all test pass in my Intel machine
Nice!
In the most recent run after removing the pytest.slow deco, the M-chip Mac tests failed for iSTFT: https://gitlab.com/coremltools1/coremltools/-/pipelines/1134078191 Could you take a look?
The intel tests test_py39_pytorch_intel seems that pytest got a Fatal Python error: Floating point exception... 😕 I don't know how to work with that.
Maybe it entered into some weird state, it would be best if you lunch it once more.
And for the other test_py310_pytorch unfortunately I cannot reproduce.
I ran specifically for the first test case that fails
_ TestSTFT.test_istft[compute_unit=ComputeUnit.CPU_ONLY-backend=('mlprogram', 'fp16')-channels=None-n_fft=16-num_frames=5-hop_length=None-win_length=None-window=None-center=False-normalized=False-onesided=None-length=None-return_complex=False] _
and got
Results (6.42s):
1 passed
so I guess it's something is done very differently on ARM machines.
Thank you for trying to reproduce it on your side! Yeah there are a lot of things could lead to different behaviors (OS version, Sys Arch of Intel/ARM, etc). Let me rerun it on CI.