coremltools icon indicating copy to clipboard operation
coremltools copied to clipboard

Add istft operation

Open alealv opened this issue 2 years ago • 33 comments

Close #2016

This code adds ISTFT op

It was a bit challenging to go through it because I needed to learn available mil operations and adapt the process. I based my self on this PR as suggested by junepiz

There are still many things I'm not confident with and I would like some help if possible:

  • Understanding notation used in MLOps
  • Proper testing of the function
  • Why does STFT doesn't use center. The same would apply to ISTFT
  • Check if the for loop in the OLA (overlap-add) function is fine or I need to use some special computation

alealv avatar Oct 24 '23 19:10 alealv

Thank you for your contributions! Left some comments.

Maybe it's because that this PR is still in draft stage (I noticed the Draft in PR title), but just want to mention that there are some undefined vars which triggers flake8 checking error: https://gitlab.com/coremltools1/coremltools/-/jobs/5539997442.

This should be fixed now.

alealv avatar Nov 15 '23 15:11 alealv

I still don't understand how you handle the center parameter in the STFT implementation

We should also handle it in the ISTFT in a similar manner.

alealv avatar Nov 15 '23 15:11 alealv

I still don't understand how you handle the center parameter in the STFT implementation

We should also handle it in the ISTFT in a similar manner.

@alealv That's a really good question.

As shown in STFT PyTorch doc, the center is only for padding the input. The center parameter is handled by PyTorch ops lowering, which means coremltools does not even see this center parameter when it looks at the Torch IR graph.

More specifically, when we lower the STFT op in STFT lowering function, the input_data already reflects the center parameter, so the center parameter is not used in mb.complex_stft. To better understand it, feel free to play with the test_stft test case in coremltools/converters/mil/frontend/torch/test/test_torch_ops.py: When input_shape is (1, 32) and center=False, the input_data in STFT lowering function is still (1, 32). However, when center is True, the input_data in STFT lowering function will become (1, 48).

junpeiz avatar Nov 23 '23 12:11 junpeiz

I finally fixed all issues and tests pass. The only result I found weird is here

        elif length and return_complex:
            with pytest.raises(ValueError, match="New var type `<class 'coremltools.converters.mil.mil.types.type_tensor.tensor.<locals>.tensor'>` not a subtype of existing var type `<class 'coremltools.converters.mil.mil.types.type_tensor.tensor.<locals>.tensor'>`"):
                TorchBaseTest.run_compare_torch(
                    input_shape,
                    ISTFTModel(),
                    backend=backend,
                    compute_unit=compute_unit
                )

alealv avatar Jan 02 '24 13:01 alealv

@alealv - Your branch is a couple months old at this point. Please rebase your changes on the current tip of main. You probably want to squash all your changes first. Once that is done, we can kick off a CI run.

@junpeiz - please re-review these changes.

TobyRoseman avatar Jan 02 '24 18:01 TobyRoseman

@alealv - Your branch is a couple months old at this point. Please rebase your changes on the current tip of main. You probably want to squash all your changes first. Once that is done, we can kick off a CI run.

@junpeiz - please re-review these changes.

Done

alealv avatar Jan 02 '24 19:01 alealv

@alealv - did you forget to git push? I'm still seeing the first commit in your branch that is not from you was from November 14th.

TobyRoseman avatar Jan 02 '24 20:01 TobyRoseman

@alealv - did you forget to git push? I'm still seeing the first commit in your branch that is not from you was from November 14th.

last commit is from 1 hour ago 13176430f36017302fcbb69e7fe4ff614abe0a57

alealv avatar Jan 02 '24 20:01 alealv

Your most recent commit is not the issue.

The issue is that your first commit (950b1a06bf39e884ac73e0ae69539cb1ebed95b7) is committed on top of a main commit (7a0706281617d6398f2cf7a46f70a46909996710) which is from November 14.

If we were to merge your changes, they would get committed on top of the current tip of main. Your pull request branch is not a good reflection of that, since many changes have been gone into main since November 14.

You basically need to do three things: 1 - Get the most recent changes from upstream main to your fork's main. 2 - Rebase this pull request branch on top of your updated main. 3 - Force push to this branch.

TobyRoseman avatar Jan 03 '24 00:01 TobyRoseman

Your most recent commit is not the issue.

The issue is that your first commit (950b1a0) is committed on top of a main commit (7a07062) which is from November 14.

If we were to merge your changes, they would get committed on top of the current tip of main. Your pull request branch is not a good reflection of that, since many changes have been gone into main since November 14.

You basically need to do three things: 1 - Get the most recent changes from upstream main to your fork's main. 2 - Rebase this pull request branch on top of your updated main. 3 - Force push to this branch.

Sorry, I was rebasing over origin and not over upstream 😅

alealv avatar Jan 03 '24 08:01 alealv

This is better. However it's still not right. The most recent commit your branch has from main is now e1111237a43381263c72c78ae722ec2cf44513bb which is still about five weeks old. Perhaps your upstream is set to somewhere other than this repository.

TobyRoseman avatar Jan 03 '24 21:01 TobyRoseman

It's due to that the author's repo alealv/coremltools that forked from apple/coremltools hasn't been synced. @alealv Could you sync your repo and do rebase? Then it will move your commits based on the most recent head of apple/coremltools.

junpeiz avatar Jan 03 '24 22:01 junpeiz

I hope now is correct.

For some reason I had gitlab.com:coremltools1/coremltools as upstream instead of https://github.com/apple/coremltools

alealv avatar Jan 04 '24 12:01 alealv

You're branch looks good now. Thanks.

Here is the CI run: https://gitlab.com/coremltools1/coremltools/-/pipelines/1127443513

TobyRoseman avatar Jan 04 '24 21:01 TobyRoseman

You're branch looks good now. Thanks.

Here is the CI run: https://gitlab.com/coremltools1/coremltools/-/pipelines/1127443513

hmmm, CI is failing con functions I didn't touch.

alealv avatar Jan 05 '24 09:01 alealv

I see now that it uses STFT underneath. I'll check this next week.

alealv avatar Jan 05 '24 11:01 alealv

Nice! The CI is very close to green. Let me know when you fix it, and I will do another round of review. Thanks!

junpeiz avatar Jan 08 '24 23:01 junpeiz

I've dived deep into the CI error.

  1. I didn't understand why it wasn't failing on my machine until I found this condition:
        def test_spectrogram(self, compute_unit, backend, input_shape, spec, power):
            if platform.machine() != "arm64":
                pytest.xfail(
                    "rdar://108001659 ([PyTorch] Torchaudio Spectrogram Failed on Intel Machine)"
                )
  • Why is this needed?
  1. On arm64 it only fails when power=None

Taking a closer look, if power is None the spectrogram function returns both real and image signals. When power is 1 or 2 then it computes the amplitud or power of the signal. So, I think the problem arise because I miss understood how the DFT matrix was computed. I thought the values were the conjugate of the original but it seems it isn't the case.

My last commit should do the trick. Also, I couldn't find any of the runs of TestSTFT, to verify if those were passing.

alealv avatar Jan 09 '24 14:01 alealv

Why is this needed?

When the Spectrogram was implemented, there was a quite strange failure on Intel Machine, and the M chip Machine works fine. We marked it as xfail while we are debugging it.

On arm64 it only fails when power=None. Taking a closer look, if power is None the spectrogram function returns both real and image signals. When power is 1 or 2 then it computes the amplitud or power of the signal. So, I think the problem arise because I miss understood how the DFT matrix was computed. I thought the values were the conjugate of the original but it seems it isn't the case.

Great investigations! Thanks!

Also, I couldn't find any of the runs of TestSTFT, to verify if those were passing.

Great catch! It's due to that test is marked by @pytest.mark.slow and got skipped (because STFT tests are too slow). I will run it locally for you.

Also, kicked off another CI with your latest commit: https://gitlab.com/coremltools1/coremltools/-/pipelines/1133907719

junpeiz avatar Jan 10 '24 18:01 junpeiz

@alealv Just want to check if you have run the tests locally on your machine? The pytest coremltools/converters/mil/frontend/torch/test/test_torch_ops.py::TestSTFT::test_istft failed on my local run, with at least following cases failures (I haven't finished a whole run):

         - coremltools/converters/mil/frontend/torch/test/test_torch_ops.py:9590 TestSTFT.test_istft[compute_unit=ComputeUnit.CPU_ONLY-backend=('mlprogram', 'fp16')-channels=3-n_fft=16-num_frames=9-hop_length=5-win_length=8-window=None-center=True-normalized=False-onesided=None-length=None-return_complex=False]
         - coremltools/converters/mil/frontend/torch/test/test_torch_ops.py:9590 TestSTFT.test_istft[compute_unit=ComputeUnit.CPU_ONLY-backend=('mlprogram', 'fp16')-channels=3-n_fft=16-num_frames=9-hop_length=5-win_length=8-window=None-center=True-normalized=False-onesided=None-length=None-return_complex=True]
         - coremltools/converters/mil/frontend/torch/test/test_torch_ops.py:9590 TestSTFT.test_istft[compute_unit=ComputeUnit.CPU_ONLY-backend=('mlprogram', 'fp16')-channels=3-n_fft=16-num_frames=9-hop_length=5-win_length=8-window=None-center=True-normalized=False-onesided=False-length=None-return_complex=False]
         - coremltools/converters/mil/frontend/torch/test/test_torch_ops.py:9590 TestSTFT.test_istft[compute_unit=ComputeUnit.CPU_ONLY-backend=('mlprogram', 'fp16')-channels=3-n_fft=16-num_frames=9-hop_length=5-win_length=8-window=None-center=True-normalized=False-onesided=False-length=None-return_complex=True]
         - coremltools/converters/mil/frontend/torch/test/test_torch_ops.py:9590 TestSTFT.test_istft[compute_unit=ComputeUnit.CPU_ONLY-backend=('mlprogram', 'fp16')-channels=3-n_fft=16-num_frames=9-hop_length=5-win_length=8-window=None-center=True-normalized=False-onesided=True-length=None-return_complex=False]
         - coremltools/converters/mil/frontend/torch/test/test_torch_ops.py:9590 TestSTFT.test_istft[compute_unit=ComputeUnit.CPU_ONLY-backend=('mlprogram', 'fp16')-channels=3-n_fft=16-num_frames=9-hop_length=5-win_length=8-window=None-center=True-normalized=True-onesided=None-length=None-return_complex=False]
         - coremltools/converters/mil/frontend/torch/test/test_torch_ops.py:9590 TestSTFT.test_istft[compute_unit=ComputeUnit.CPU_ONLY-backend=('mlprogram', 'fp16')-channels=3-n_fft=16-num_frames=9-hop_length=5-win_length=8-window=None-center=True-normalized=True-onesided=None-length=None-return_complex=True]
         - coremltools/converters/mil/frontend/torch/test/test_torch_ops.py:9590 TestSTFT.test_istft[compute_unit=ComputeUnit.CPU_ONLY-backend=('mlprogram', 'fp16')-channels=3-n_fft=16-num_frames=9-hop_length=5-win_length=8-window=None-center=True-normalized=True-onesided=False-length=None-return_complex=False]
         - coremltools/converters/mil/frontend/torch/test/test_torch_ops.py:9590 TestSTFT.test_istft[compute_unit=ComputeUnit.CPU_ONLY-backend=('mlprogram', 'fp16')-channels=3-n_fft=16-num_frames=9-hop_length=5-win_length=8-window=None-center=True-normalized=True-onesided=False-length=None-return_complex=True]

To be safer, let me remove those @pytest.mark.slow decorators and re-run CI: https://gitlab.com/coremltools1/coremltools/-/pipelines/1134078191

junpeiz avatar Jan 10 '24 22:01 junpeiz

@alealv Just want to check if you have run the tests locally on your machine? The pytest coremltools/converters/mil/frontend/torch/test/test_torch_ops.py::TestSTFT::test_istft failed on my local run, with at least following cases failures (I haven't finished a whole run):

         - coremltools/converters/mil/frontend/torch/test/test_torch_ops.py:9590 TestSTFT.test_istft[compute_unit=ComputeUnit.CPU_ONLY-backend=('mlprogram', 'fp16')-channels=3-n_fft=16-num_frames=9-hop_length=5-win_length=8-window=None-center=True-normalized=False-onesided=None-length=None-return_complex=False]
         - coremltools/converters/mil/frontend/torch/test/test_torch_ops.py:9590 TestSTFT.test_istft[compute_unit=ComputeUnit.CPU_ONLY-backend=('mlprogram', 'fp16')-channels=3-n_fft=16-num_frames=9-hop_length=5-win_length=8-window=None-center=True-normalized=False-onesided=None-length=None-return_complex=True]
         - coremltools/converters/mil/frontend/torch/test/test_torch_ops.py:9590 TestSTFT.test_istft[compute_unit=ComputeUnit.CPU_ONLY-backend=('mlprogram', 'fp16')-channels=3-n_fft=16-num_frames=9-hop_length=5-win_length=8-window=None-center=True-normalized=False-onesided=False-length=None-return_complex=False]
         - coremltools/converters/mil/frontend/torch/test/test_torch_ops.py:9590 TestSTFT.test_istft[compute_unit=ComputeUnit.CPU_ONLY-backend=('mlprogram', 'fp16')-channels=3-n_fft=16-num_frames=9-hop_length=5-win_length=8-window=None-center=True-normalized=False-onesided=False-length=None-return_complex=True]
         - coremltools/converters/mil/frontend/torch/test/test_torch_ops.py:9590 TestSTFT.test_istft[compute_unit=ComputeUnit.CPU_ONLY-backend=('mlprogram', 'fp16')-channels=3-n_fft=16-num_frames=9-hop_length=5-win_length=8-window=None-center=True-normalized=False-onesided=True-length=None-return_complex=False]
         - coremltools/converters/mil/frontend/torch/test/test_torch_ops.py:9590 TestSTFT.test_istft[compute_unit=ComputeUnit.CPU_ONLY-backend=('mlprogram', 'fp16')-channels=3-n_fft=16-num_frames=9-hop_length=5-win_length=8-window=None-center=True-normalized=True-onesided=None-length=None-return_complex=False]
         - coremltools/converters/mil/frontend/torch/test/test_torch_ops.py:9590 TestSTFT.test_istft[compute_unit=ComputeUnit.CPU_ONLY-backend=('mlprogram', 'fp16')-channels=3-n_fft=16-num_frames=9-hop_length=5-win_length=8-window=None-center=True-normalized=True-onesided=None-length=None-return_complex=True]
         - coremltools/converters/mil/frontend/torch/test/test_torch_ops.py:9590 TestSTFT.test_istft[compute_unit=ComputeUnit.CPU_ONLY-backend=('mlprogram', 'fp16')-channels=3-n_fft=16-num_frames=9-hop_length=5-win_length=8-window=None-center=True-normalized=True-onesided=False-length=None-return_complex=False]
         - coremltools/converters/mil/frontend/torch/test/test_torch_ops.py:9590 TestSTFT.test_istft[compute_unit=ComputeUnit.CPU_ONLY-backend=('mlprogram', 'fp16')-channels=3-n_fft=16-num_frames=9-hop_length=5-win_length=8-window=None-center=True-normalized=True-onesided=False-length=None-return_complex=True]

Yes, all test pass in my Intel machine

Results (28832.46s (8:00:32)):
    21240 passed
    11160 skipped

That's for the whole TestSTFT

alealv avatar Jan 11 '24 19:01 alealv

Yes, all test pass in my Intel machine

Nice!

In the most recent run after removing the pytest.slow deco, the M-chip Mac tests failed for iSTFT: https://gitlab.com/coremltools1/coremltools/-/pipelines/1134078191 Could you take a look?

junpeiz avatar Jan 11 '24 21:01 junpeiz

The intel tests test_py39_pytorch_intel seems that pytest got a Fatal Python error: Floating point exception... 😕 I don't know how to work with that. Maybe it entered into some weird state, it would be best if you lunch it once more.

And for the other test_py310_pytorch unfortunately I cannot reproduce.

I ran specifically for the first test case that fails

_ TestSTFT.test_istft[compute_unit=ComputeUnit.CPU_ONLY-backend=('mlprogram', 'fp16')-channels=None-n_fft=16-num_frames=5-hop_length=None-win_length=None-window=None-center=False-normalized=False-onesided=None-length=None-return_complex=False] _

and got

Results (6.42s):
       1 passed

so I guess it's something is done very differently on ARM machines.

alealv avatar Jan 17 '24 10:01 alealv

Thank you for trying to reproduce it on your side! Yeah there are a lot of things could lead to different behaviors (OS version, Sys Arch of Intel/ARM, etc). Let me rerun it on CI.

junpeiz avatar Jan 17 '24 17:01 junpeiz