transformers icon indicating copy to clipboard operation
transformers copied to clipboard

[TensorFlow] Adding GroupViT

Open ariG23498 opened this issue 3 years ago โ€ข 18 comments

Adding TensorFlow version of GroupViT

This PR adds the TensorFlow version of GroupViT.

Before submitting

  • [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • [x] Did you read the contributor guideline, Pull Request section?
  • [ ] Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
  • [x] Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
  • [x] Did you write any new necessary tests?

CC: @LysandreJik @Rocketknight1

ariG23498 avatar Jul 05 '22 05:07 ariG23498

The documentation is not available anymore as the PR was closed or merged.

Tagging @NielsRogge @Rocketknight1 @gante for the PR review!

ariG23498 avatar Aug 03 '22 08:08 ariG23498

@ariG23498 The parsing issue seen in the check_repository_consistency tests is arising because of @keras_serializable decorators being below # Copied from statements. If I check out your branch, I can run make fix-copies successfully if I move the decorator to be above the comment.

amyeroberts avatar Aug 05 '22 11:08 amyeroberts

Hey @ariG23498 ๐Ÿ‘‹

Regarding failing tests:

  • The tests in run_tests_torch_and_tf seem to need a slightly higher tolerance. Let's leave those as their are for now (failing), as we might be able to fix them throughout the PR review process :)
  • The other errors is because the attribute base_model_prefix does not match the name of the attribute that is used to hold the models. I.e. it should be changed to base_model_prefix = "group_vit" (notice the _ :D ) [base_model_prefix wrong -> this line does not get matched with the correct class -> raises the exception in the function]

gante avatar Aug 09 '22 10:08 gante

Hey @gante thanks for the insights.

The other errors is because the attribute base_model_prefix does not match the name of the attribute that is used to hold the models. I.e. it should be changed to base_model_prefix = "group_vit" (notice the _ :D ) [base_model_prefix wrong -> this line does not get matched with the correct class -> raises the exception in the function]

The PyTorch implementation has groupvit as the base_model_prefix. Would this be a problem if I changed the name for the TensorFlow port?

ariG23498 avatar Aug 09 '22 10:08 ariG23498

@ariG23498 Good point -- groupvit (no underscore) should be used everywhere then!

gante avatar Aug 09 '22 11:08 gante

Fixes #18543

CC: @NielsRogge

ariG23498 avatar Aug 09 '22 11:08 ariG23498

References

  • hard_softmax: https://gist.github.com/ariG23498/08cdae21637b8b61bdd6d21d11719fb3
  • resize_attention_map: https://gist.github.com/ariG23498/3777f8d9be25de8ae782256f5aacb2c5

ariG23498 avatar Aug 24 '22 10:08 ariG23498

@amyeroberts I have added the shape_list back to places which are copied from other parts of the repository. This was done to pass the make fixup.

ariG23498 avatar Aug 24 '22 14:08 ariG23498

Hey @amyeroberts and @gante I had to swap back in the shape_list and also the axis=range(len(shape_list(logits)))[dim] to pass the tests.

ariG23498 avatar Aug 30 '22 05:08 ariG23498

Perhaps relevant to the TF-PT mismatch: https://github.com/huggingface/transformers/pull/18555#issuecomment-1230066514

gante avatar Aug 30 '22 12:08 gante

Perhaps relevant to the TF-PT mismatch: #18555 (comment)

Hey @gante I do not think this might be the source of the problem.

nn.functional.interpolate has been used twice in the torch code:

I have a colab notebook that takes care of the tf and pt equivalence for the resize_attention_map. As for the interpolate_pos_encoding, it has been taken from the ViT implementation (which is assumed to have no equivalence problem?).

ariG23498 avatar Aug 31 '22 02:08 ariG23498

@ariG23498 With the selected seeds, it passes now on CircleCI. This depends also on hardware, so it may or may not work when running on different machines, sadly.

ydshieh avatar Sep 16 '22 16:09 ydshieh

It passes on GCP CPU-only VM too. For GPU, we might need different seeds, but we can do that later.

ydshieh avatar Sep 16 '22 16:09 ydshieh

tests/models/groupvit/test_modeling_tf_groupvit.py::TFGroupViTModelTest::test_keras_fit failed however. I will leave @ariG23498 to check this one ๐Ÿ™

ydshieh avatar Sep 16 '22 17:09 ydshieh

Tagging @sgugger for a review (as core maintainer) ๐ŸŽ‰

After this PR has 3 approvals (including Sylvain's), these are the instructions to finalize the PR:

  1. Add the TF weights for the model(s). Assuming you are logged in to the Hub in your terminal (if not, run huggingface-cli login) and are on this branch, run transformers-cli pt-to-tf --model-name foo/bar, where foo/bar is the target model repo. Assuming it passes the CLI's checks, it will open a PR on the model hub. Please tag @_nielsr, @_sgugger, @_joaogante (โš ๏ธ removing the _) in the opened Hub PRs :)
  2. After the TF model weights are merged, make sure there are no from_pt=True in the PR. Then, re-run the test suite for the model locally and confirm that all tests pass (command: NVIDIA_TF32_OVERRIDE=0 RUN_SLOW=1 py.test -vv tests/models/groupvit/modeling_tf_groupvit.py)

gante avatar Sep 19 '22 08:09 gante

The run_tests_hub fails for the time being. @ydshieh thinks it is irrelevant to this PR. All the relevant tests pass.

Pinging @Rocketknight1 for the serving outputs bit (as mentioned by @sgugger here).

ariG23498 avatar Sep 20 '22 10:09 ariG23498

@Rocketknight1 The changes you have suggested are taken care of! @ydshieh the tests pass as well!

@gante @amyeroberts should we talk about the deprecated code used here, or should we have a different PR to handle this?

@gante over to you for saving the TF model to hub ๐Ÿซ‚

ariG23498 avatar Sep 20 '22 17:09 ariG23498

FYI: The failing tests are not related to the GroupViT model.

ariG23498 avatar Sep 21 '22 15:09 ariG23498

The failure is unrelated. As long as all models have been moved this PR can be merged.

sgugger avatar Sep 22 '22 11:09 sgugger

@gante over to you now! ๐Ÿค—

ariG23498 avatar Sep 22 '22 11:09 ariG23498

@ariG23498 adding TF weights as PRs to the hub is now available to everyone -- you can have a go at it, following the instructions above!

I can also do it. Let me know how you want to proceed :)

gante avatar Sep 22 '22 14:09 gante

Thanks @gante I will give it a try myself!

ariG23498 avatar Sep 23 '22 09:09 ariG23498

@gante while running transformers-cli pt-to-tf --model-name nvidia/groupvit-gcc-yfcc I get the following error.

List of maximum hidden layer differences above the threshold (5e-05):
text_model_outputhidden_states[1]: 9.155e-05
text_model_outputhidden_states[2]: 9.155e-05
text_model_outputhidden_states[3]: 9.155e-05
text_model_outputhidden_states[4]: 9.155e-05
text_model_outputhidden_states[5]: 9.155e-05
text_model_outputhidden_states[6]: 9.155e-05
text_model_outputhidden_states[7]: 9.155e-05
text_model_outputhidden_states[8]: 9.155e-05
text_model_outputhidden_states[9]: 9.155e-05
text_model_outputhidden_states[10]: 9.155e-05
text_model_outputhidden_states[11]: 9.155e-05
text_model_outputhidden_states[12]: 9.155e-05
vision_model_outputhidden_states[1]: 2.441e-04
vision_model_outputhidden_states[2]: 7.629e-05
vision_model_outputhidden_states[3]: 7.629e-05

I think this is because of the seed issue that @ydshieh had pointed out.

How would you want me to proceed in this case?

ariG23498 avatar Sep 23 '22 10:09 ariG23498

@gante thanks for the help.

The PR for the TF weights on HF Hub are here.

ariG23498 avatar Sep 23 '22 16:09 ariG23498

It seems ready to be merged, except for an inconsistency that arises from a # Copied from statement -- lmk if you need a hand!

gante avatar Sep 28 '22 14:09 gante

@gante I have taken care of the copied from inconsistency.

I think we are good to go.

ariG23498 avatar Sep 29 '22 00:09 ariG23498

Merged! Thank you for making the TF users happier, @ariG23498 ๐Ÿงก

gante avatar Sep 29 '22 09:09 gante