ivy icon indicating copy to clipboard operation
ivy copied to clipboard

fix: torch backend- gather fix

Open Kacper-W-Kozdon opened this issue 1 year ago • 5 comments

PR Description

Previous implementation of gather() from torch backend was incorrect but did not get caught by pytest. I corrected a vast majority of issues. Pytest might still show some errors for edge cases due to the very limited implementation of torch.gather() and various workarounds needed to recreate tensorflow's ground truth behaviour, they'll likely be related to dimensionality and I'm still removing them (it's all in the conditions for reshaping and expanding). Side note: the official tensorflow's documentation for gather contains errors in the expression to calculate the dimensions of the output.

Related Issue

Closes https://github.com/unifyai/ivy/issues/26498

Checklist

  • [x] Did you add a function?
  • [ ] Did you add the tests?
  • [ ] Did you run your tests and are your tests passing?
  • [x] Did pre-commit not fail on any check?
  • [x] Did you follow the steps we provided?

Socials

Kacper-W-Kozdon avatar Dec 14 '23 19:12 Kacper-W-Kozdon

Reformatting Task Checklist

IMPORTANT NOTICE 🚨:

The Ivy Docs represent the ground truth for the task descriptions and this checklist should only be used as a supplementary item to aid with the review process.

LEGEND 🗺:

  • ❌ : Check item is not completed.
  • ✅ : Check item is ready for review.
  • 🆘 : Stuck/Doubting implementation (PR author should add comments explaining why).
  • ⏩ : Check is not applicable to function (skip).
  • 🆗 : Check item is already implemented and does not require any edits.

CHECKS 📑:

    • [ ] 🆗: Make sure that the aforementioned methods are added into the correct category-specific parent class, such as ivy.ArrayWithElementwise, ivy.ContainerWithManipulation etc.
    • [ ] 🆗: Add the correct Docstrings to every function and its relevant methods, including those you did not implement yourself. The following should be added:
        • [ ] 🆗: The function's Array API standard description in ivy/functional/general.py. If the function is not part of the Array API standard then a description of similar style should be added to the same file. The following modifications should be made to the description:
          • [ ] 🆗: Remove type definitions in the Parameters and Returns sections.
          • [ ] 🆗: Add out to the Parameters section if function accepts an out argument.
          • [ ] 🆗: Replace out with ret in the Returns section.
    • [ ] 🆗: Add thorough Docstring Examples for every function and its relevant methods and ensure they pass the docstring tests.

      Functional Examples in ivy/functional/general.py.

        • [ ] 🆗: Cover all possible variants for each of the arguments independently (not combinatorily).
        • [ ] 🆗: Vary the values and input shapes considerably between examples.
        • [ ] 🆗: Start out simple and get more complex with each example.
        • [ ] 🆗: Show an example with:
          • [ ] 🆗: out unused.
          • [ ] 🆗: out used to update a new array y.
          • [ ] 🆗: out used to inplace update the input array x (if x has the same dtype and shape as the return).
        • [ ] 🆗: If broadcasting is relevant for the function, then show examples which highlight this.

      Nestable Function Examples in ivy/functional/general.py. Only if the function supports nestable operations.

        • [ ] 🆗: Add an example that passes in an ivy.Container instance in place of one of the arguments.
        • [ ] 🆗: Add an example passes in ivy.Container instances for multiple arguments.

      Container Static Method Examples in ivy/container/general.py.

        • [ ] 🆗: The example from point (6.f) should be replicated, but added to the ivy.Container static method docstring in with ivy.<func_name> replaced with ivy.Container.static_<func_name> in the example.
        • [ ] 🆗: The example from point (6.g) should be replicated, but added to the ivy.Container static method docstring, with ivy.<func_name> replaced with ivy.Container.static_<func_name> in the example.

      Array Instance Method Example in ivy/array/general.py.

        • [ ] 🆗: Call this instance method of the ivy.Array class.

      Container Instance Method Example in ivy/container/general.py.

        • [ ] 🆗: Call this instance method of the ivy.Container class.

      Array Operator Examples in ivy/array/array.py.

        • [ ] ⏩: Call the operator on two ivy.Array instances.
        • [ ] ⏩: Call the operator with an ivy.Array instance on the left and ivy.Container on the right.

      Array Reverse Operator Example in ivy/array/array.py.

        • [ ] ⏩: Call the operator with a Number on the left and an ivy.Array instance on the right.

      Container Operator Examples in ivy/container/container.py.

        • [ ] ⏩: Call the operator on two ivy.Container instances containing Number instances at the leaves.
        • [ ] ⏩: Call the operator on two ivy.Container instances containing ivy.Array instances at the leaves.
        • [ ] ⏩: Call the operator with an ivy.Container instance on the left and ivy.Array on the right.

      Container Reverse Operator Example in ivy/container/container.py.

        • [ ] ⏩: Following example in the ivy.Container.__radd__ docstring, with the operator called with a Number on the left and an ivy.Container instance on the right.

      Tests

        • [ ] ✅: Docstring examples tests passing.
        • [ ] ✅: Lint checks passing.

Kacper-W-Kozdon avatar Dec 14 '23 19:12 Kacper-W-Kozdon

@AnnaTz the pull request I was asked to tag you in- for torch gather. Pytest might still throw some issues for edge cases (mostly related to dimensionality), I'm still fixing them, but it seems that I've reached the point where it's all just related to finding the right conditions for reshaping and expanding params and indices before feeding them into the native torch.gather(). Some of those issues are related to errors I found in the tensorflow's official documentation- and tensorflow's outputs are used as the ground truth.

Kacper-W-Kozdon avatar Dec 14 '23 19:12 Kacper-W-Kozdon

@AnnaTz ready to review! Probably even to merge (no errors), but due to my previous experiences with pytest for some reason failing to detect serious issues, I'd greatly appreciate it if you double-checked. However, the function could probably use some polishing to make it simpler, there might be some redundant conditions still- and that could probably use a look as well. On the plus side- ivy.gather() on torch backend finally uses torch methods instead of borrowing from some other libraries. The results of the command pytest -s -v --full-trace --debug=pytestdebug.log ivy_tests/test_ivy/test_functional/test_core/test_general.py::test_gather[cpu-torch-False-False] are attached in the pytestdebug.log. pytestdebug.log

UPDATE: I ran more tests with pytest, there indeed was an overlook- adjusting the dimensions to meet torch.gather() requirements requires also adjusting the axis (if params.shape() goes from torch.Size([2]) to torch.Size([1, 2]), then axis has to compensate for the insertion of that extra dimension)- I've already fixed that but not commited yet. I've already done that. There is also some small bug somewhere which does prevent torch.Tensor.expand() somewhere in the conditions- I'm looking into that.

Kacper-W-Kozdon avatar Dec 23 '23 00:12 Kacper-W-Kozdon

Hey @saeedashrraf, could you please take a look at this PR, given that it's about gather? Thanks 😄

vedpatwardhan avatar Dec 26 '23 03:12 vedpatwardhan

Hey @saeedashrraf, could you please take a look at this PR, given that it's about gather? Thanks 😄

I'm still removing some issues in the code but I'll appreciate any advice :) And there's still a possibility that it's not worth in this case to try to do it the way I'm trying and instead it might be better to revert gather to the version from some older commit. However, I still would like to get that piece of code to pass all the tests.

UPDATE: Ok, I think it can actually get significantly simplified in a way, which won't result in weird behaviour, I'm on it but it's a bit of rewriting, however the code should be much simpler.

UPDATE: Seems to be fixed :) But I'd appreciate double-checking the pytest, because bugs have already slipped twice past it for me and I'm not sure why (it's probably related to the generation of Hypothesis examples when rerunning the pytest). I'm attaching my latest pytestdebug.log for you to check and compare to if you run your own. I ran the pytest just for the torch backend here: pytest -s -v --full-trace --debug=pytestdebug.log ivy_tests/test_ivy/test_functional/test_core/test_general.py::test_gather[cpu-torch-False-False] pytestdebug.log

Kacper-W-Kozdon avatar Dec 26 '23 10:12 Kacper-W-Kozdon

@saeedashrraf the PR is ready for review.

Kacper-W-Kozdon avatar Jan 09 '24 18:01 Kacper-W-Kozdon

Hey @saeedashrraf, could you please take a look at this PR, given that it's about gather? Thanks 😄

Hey @vedpatwardhan - is there a chance for a review :) ? It seems that the person assigned to review this PR has not been active throughout the last month and the PR is probably ready to be merged (but a check through the code and a separate pytest would definitely not hurt).

Kacper-W-Kozdon avatar Jan 19 '24 12:01 Kacper-W-Kozdon

Hey @vedpatwardhan - is there a chance for a review :) ? It seems that the person assigned to review this PR has not been active throughout the last month and the PR is probably ready to be merged (but a check through the code and a separate pytest would definitely not hurt).

Hey @Kacper-W-Kozdon, really sorry for the delay in reviewing. @Ishticode could you please take a quick look at the changes made if we can merge it? Thanks 😄

vedpatwardhan avatar Jan 22 '24 04:01 vedpatwardhan

@reproduce_failure('6.78.1', b'AXicY2NkAAMoBaEZGRgZGZgYkABYGAACCAAQ')

Initially- I don't see anything that would cause a problem; I checked in a jupyter notebook (https://github.com/Kacper-W-Kozdon/notebook-testing-ivy/blob/main/Testing_Python.ipynb), under the evaluated input In[3]. I'll check with the decorator and in the ivy_dev venv, but at the very least it's nothing obvious and at least the notebook doesn't throw any errors.

Kacper-W-Kozdon avatar Jan 22 '24 17:01 Kacper-W-Kozdon

Hi @Kacper-W-Kozdon Hi @vedpatwardhan Thank you so much for all the effort. Will keep your PR on priority. It fixes several issues but i ran test_gather with 200 examples on our torch backend from your branch. It seems to fail with following value mismatch.

E       AssertionError:  the results from backend torch and ground truth framework tensorflow do not match
E        [1.]!=[0.9995117] 
E       
E       The mismatching elements are at `False` indices:
E       
E       [False] 
E       
E       
E       Falsifying example: test_gather(
E           backend_fw='torch',
E           on_device='cpu',
E           params_indices_others=(['float16', 'int32'],
E            array([-1.], dtype=float16),
E            array([[0, 0, 0],
E                   [0, 0, 0]], dtype=int32),
E            0,
E            0),
E           fn_name='gather',
E           test_flags=FunctionTestFlags(
E               ground_truth_backend='tensorflow',
E               num_positional_args=2,
E               with_out=False,
E               with_copy=False,
E               instance_method=False,
E               test_gradients=True,
E               test_trace=False,
E               transpile=False,
E               as_variable=[False],
E               native_arrays=[False],
E               container=[False],
E               precision_mode=False,
E               test_cython_wrapper=False,
E           ),
E       )
E       
E       You can reproduce this example by temporarily adding @reproduce_failure('6.78.1', b'AXicY2NkAAMoBaEZGRgZGZgYkABYGAACCAAQ') as a decorator on your test case

Do you have any ideas where this is emerging from and are you able to fix? Thank for the effort and let me know if something is unclear.

I'm guessing that the assertion error comes from the flag test_gradients=True? Could it be related to rtol or atol params not being set in the test? Do these two params also get passed to the gradient test? If they are, I think that this might be the case, since the raw outputs of the gather() function seem to be ok. I don't have enough experience to suggest good values for rtol and atol but in the Deep Dive they are set as 10^(-3). I will also double check how dtypes are treated in the native torch.gather(), I don't see any reason for other torch functions to affect types- though this one shouldn't either.

I ran the test with an example decorator added between @handle_test() and def test_gather() @example(params_indices_others=(['float16', 'int32'], ivy.array([-1.]), ivy.array([[0, 0, 0], [0, 0, 0]]), 0, 0), test_flags=FunctionTestFlags( ground_truth_backend='tensorflow', num_positional_args=2, with_out=False, with_copy=False, instance_method=False, test_gradients=True, test_trace=False, transpile=False, as_variable=[False], native_arrays=[False], container=[False], precision_mode=False, test_cython_wrapper=False, ), fn_name="gather", ), however, I got a warning: "ivy_tests/test_ivy/test_functional/test_core/test_general.py::test_gather[cpu-torch-False-False] SKIPPED (Function not implemented in backend.)", even though the test is meant to be for the functional/backend. So I'm not 100% sure this example was taken into account. The tests passed 100% (default number of examples: 25). pytestdebug.log

Kacper-W-Kozdon avatar Jan 22 '24 21:01 Kacper-W-Kozdon

Hey @Kacper-W-Kozdon Thank you for the suggestions. I don't think gradient tests should be disabled. and even with tolerance added (i've set 1e-3, try to change around if you like) we get significant value mismatches. For examples one i got was

E       AssertionError:  the results from backend torch and ground truth framework tensorflow do not match
E        [[0.20898438 0.12158203 0.12109375 0.04736328 0.        ]
E        [0.20898438 0.12158203 0.12109375 0.04736328 0.        ]]!=[[0.0625   0.0625   0.0625   0.046875 0.      ]
E        [0.0625   0.0625   0.0625   0.046875 0.      ]]

Could you please try test_gather with more examples by adding --num-examples 200.

Pls let me know further if something is unclear. Thank you very much :)

Got it, I'll update my packages (my hypothesis was behind) and rerun the tests with more examples. Once I see the details of the errors and the inputs, I'll try to isolate the part of the code which might be causing issues and let you know.

Kacper-W-Kozdon avatar Jan 26 '24 13:01 Kacper-W-Kozdon

Hey @Kacper-W-Kozdon Thank you for the suggestions. I don't think gradient tests should be disabled. and even with tolerance added (i've set 1e-3, try to change around if you like) we get significant value mismatches. For examples one i got was

E       AssertionError:  the results from backend torch and ground truth framework tensorflow do not match
E        [[0.20898438 0.12158203 0.12109375 0.04736328 0.        ]
E        [0.20898438 0.12158203 0.12109375 0.04736328 0.        ]]!=[[0.0625   0.0625   0.0625   0.046875 0.      ]
E        [0.0625   0.0625   0.0625   0.046875 0.      ]]

Could you please try test_gather with more examples by adding --num-examples 200.

Pls let me know further if something is unclear. Thank you very much :)

I ran the test with 200 examples- it passed. I repeated with 500- and I ran into a similar error. I can 100% confirm that the last test called before the failed assertions was gradient_test. Gradients run after asserting that the outputs of the gather function are correct (types and raw values). Right now I have no clue, why the gradients would fail with the direct outputs of gather seemingly matching. Also, I used the @reproduce_failure decorator with the test that failed and I received the message "E hypothesis.errors.DidNotReproduce: Expected the test to raise an error, but it completed successfully.". Could you try to do it on your end as well with the test that failed to see if it can get reproduced and what the last function called was? In my case, the decorator was @reproduce_failure('6.97.0', b'AXicY2NkYmJmZmZiYGdhPMAAB4ztf37pNYAYQHz8ApCInDmDkYGZgZGRA6KeBAAygwEAIpcHTQ==') but I suppose that's only a local thing (I am still learning how pytest and hypothesis packages handle the testing).

I'm attaching the example which caused the error (the indices had a big shape, so it's in a .txt for readability).

falsifying_example.txt

Kacper-W-Kozdon avatar Jan 26 '24 19:01 Kacper-W-Kozdon

Hey, @Ishticode , I'll try to either open a new issue or modify within this PR the test for gather (or rather- the helper function used in it) to limit the values in the input array to what can be correctly represented in a given dtype. Current test uses a helper function which does not have arguments to control the range of values on the input but it calls dtypes_and_values(), which has such arguments. It seems that this (when fed into test for gradients) might cause the errors rather than the gather() function itself.

Kacper-W-Kozdon avatar Feb 03 '24 17:02 Kacper-W-Kozdon

I'd need the #28168 PR reviewed and merged first in order to solve the issue of some tests failing due to the mismatch in values and dtype (values too big for the dtype to retain the necessary precision). Based on the feedback from Ved, it should be enough to clear all the tests, given that the errors show up in the gradient test associated with test_gather through the flag in test_gather.

Kacper-W-Kozdon avatar Feb 04 '24 16:02 Kacper-W-Kozdon

Ready for review (and merge) together with https://github.com/unifyai/ivy/pull/28168

Kacper-W-Kozdon avatar Feb 06 '24 20:02 Kacper-W-Kozdon

@Ishticode would you mind taking look at https://github.com/unifyai/ivy/pull/28168 alongside this PR? The tests shouldn't throw any more errors when these two are merged.

Kacper-W-Kozdon avatar Feb 13 '24 12:02 Kacper-W-Kozdon

Hi @Kacper-W-Kozdon Thnx for the fixes. The tests all look fine. You really have put a lot of effort into this :) I might ask @vedpatwardhan to have a quick look before merging. Sorry for the delays on my end. One question I still have is that your initial comment you said the torch backend implementation was wrong and pytest wasn't catching it. As long as i can see we have not broadened the test to include newer cases. We have only restricted it somewhat with safety and tolerances. So how do we know that those cases that were previously undetected are still not going unnoticed?

I'm not fully sure but I think my pytest was not installed properly at that time- gather() was my first contribution, the tests were passing even though they shouldn't have, which I haven't realised at first (after I reinstalled everything, they were, indeed, throwing errors).

As for the recent errors- https://github.com/unifyai/ivy/pull/28168 solved them; the tests for gather were generating values too big for their dtype, which was causing assertion errors going beyond the tolerances. So it's the combination of both of these PRs that solved the issues.

Kacper-W-Kozdon avatar Feb 27 '24 10:02 Kacper-W-Kozdon

lgtm! Could you please revert the changes to the demos submodule? The PR should be good to merge once that's done @Ishticode. Thanks @Kacper-W-Kozdon 😄

Done, docs/demos changes reverted 👍

Kacper-W-Kozdon avatar Mar 04 '24 09:03 Kacper-W-Kozdon

@vedpatwardhan @Ishticode I added explicitly the tolerance for bfloat16- hopefully, this will prevent the gradient test tripping the tolerances assertion on this type. Other than that, it seems to be ready.

Kacper-W-Kozdon avatar Mar 13 '24 11:03 Kacper-W-Kozdon

This PR has been labelled as stale because it has been inactive for more than 7 days. If you would like to continue working on this PR, then please add another comment or this PR will be closed in 7 days.

ivy-seed avatar Mar 22 '24 05:03 ivy-seed