transformers icon indicating copy to clipboard operation
transformers copied to clipboard

fix type hints with tuple

Open ZhiyuanChen opened this issue 1 year ago • 17 comments

What does this PR do?

Fix type hints in modeling_bert.BertSelfAttention.

Without , ... in Optional[Tuple[Tuple[torch.FloatTensor, ...], ...]], mypy would think the Tuple consists only one element and leads to Tuple index out of range [misc] error

Before submitting

  • [x] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • [ ] Did you read the contributor guideline, Pull Request section?
  • [ ] Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
  • [ ] Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
  • [ ] Did you write any new necessary tests?

Who can review?

@ArthurZucker @Narsil @ydshieh

ZhiyuanChen avatar Apr 03 '24 10:04 ZhiyuanChen

Hi @ZhiyuanChen

Thank you for opening the PR. @ArthurZucker this looks making sense to me. But we might need to change this in many places. Any comment regarding what to do in this PR? Like change any occurrence of

past_key_value: Optional[Tuple[Tuple[torch.FloatTensor]]]

to what has been done in this PR?

ydshieh avatar Apr 03 '24 12:04 ydshieh

Hi @ydshieh

Thank you for your prompt response.

I'm not quite familiar with the transformers but as I read from the single model file philosophy, can the copying function (# Copied from <predecessor_model>.<function> ) work?

ZhiyuanChen avatar Apr 03 '24 13:04 ZhiyuanChen

I also have a question... As the length of the past_key_value is fixed, may be we should use Optional[Tuple[Tuple[torch.FloatTensor, torch.FloatTensor]]] instead?

ZhiyuanChen avatar Apr 03 '24 13:04 ZhiyuanChen

I'm not quite familiar with the transformers but as I read from the single model file philosophy, can the copying function (# Copied from <predecessor_model>.<function> ) work?

Copied from

is a mark to state a place is copied from another one, and the check use this information to make sure they are indeed a copy (i.e. must be identical). It can be used to correct some copies in an automatically, but it depends on how we write those statements.

ydshieh avatar Apr 03 '24 13:04 ydshieh

Optional[Tuple[Tuple[torch.FloatTensor, torch.FloatTensor]]]

There is at least one place to use ... I believe, as it depends on the number of hidden layers

ydshieh avatar Apr 03 '24 13:04 ydshieh

Optional[Tuple[Tuple[torch.FloatTensor, torch.FloatTensor]]]

There is at least one place to use ... I believe, as it depends on the number of hidden layers

So... Optional[Tuple[Tuple[torch.FloatTensor, torch.FloatTensor], ...]] ?

ZhiyuanChen avatar Apr 03 '24 13:04 ZhiyuanChen

I think so, but let's wait @ArthurZucker 's comment first.

ydshieh avatar Apr 03 '24 14:04 ydshieh

Hey! SOrry I have no idea, @Rocketknight1 is the TYPING king! 🤗

ArthurZucker avatar Apr 05 '24 08:04 ArthurZucker

Hi @ZhiyuanChen, you're right that we're quite inconsistent with this stuff. In a lot of our codebase, we don't strictly follow MyPy, and we use Tuple[x] to mean Tuple[x, ...].

If you want to make a PR to fix it, I think that's cool, but I think that PR should fix it across a lot of models at once - or else we'll have to individually review/merge 1000 different PRs for individual functions/classes! Also, note that the past_key_value argument passed to a single layer may have a different structure than the past_key_values argument passed to the whole model.

Rocketknight1 avatar Apr 05 '24 15:04 Rocketknight1

Hi @ZhiyuanChen, you're right that we're quite inconsistent with this stuff. In a lot of our codebase, we don't strictly follow MyPy, and we use Tuple[x] to mean Tuple[x, ...].

If you want to make a PR to fix it, I think that's cool, but I think that PR should fix it across a lot of models at once - or else we'll have to individually review/merge 1000 different PRs for individual functions/classes! Also, note that the past_key_value argument passed to a single layer may have a different structure than the past_key_values argument passed to the whole model.

Understood. I'll try to submit a pr next week

ZhiyuanChen avatar Apr 05 '24 15:04 ZhiyuanChen

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar May 04 '24 08:05 github-actions[bot]

My apology, I was busy preparing for my final. I'll definitely update this week!

ZhiyuanChen avatar May 06 '24 05:05 ZhiyuanChen

Optional[Tuple[Tuple[torch.FloatTensor, torch.FloatTensor]]]

There is at least one place to use ... I believe, as it depends on the number of hidden layers

So... Optional[Tuple[Tuple[torch.FloatTensor, torch.FloatTensor], ...]] ?

@ydshieh You are certainly right.

The type of past_key_value should be just Tuple[Tensor, Tensor], and the type of past_key_values should be Tuple[Tuple[Tensor, Tensor], ...].

Have you ever considered some tools like pre-commit to perform some additional static checks other than ruff?

ZhiyuanChen avatar May 08 '24 07:05 ZhiyuanChen

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Jun 01 '24 08:06 github-actions[bot]

@Rocketknight1 Could you please review this PR? Also, I was wondering if you have plans to include MyPy and other checks like pre-commit. I understand it is a huge task and can be extremely painful, but I'm happy to help.

ZhiyuanChen avatar Jun 05 '24 12:06 ZhiyuanChen

Hi @ZhiyuanChen, I don't think we will ever achieve complete MyPy compatibility, unfortunately! Many of our functions are extremely polymorphic, and even enumerating all the possibilities in the return type would get really painful.

Also, I'm not sure I can review the PR yet - there's still a lot of conflicts in the merge itself, and some of the quality checks are failing! Try:

  1. Rebasing the PR onto the latest version of transformers
  2. Running make style to fix any issues with the formatting afterwards

Rocketknight1 avatar Jun 05 '24 14:06 Rocketknight1

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Jun 30 '24 08:06 github-actions[bot]