transformers
transformers copied to clipboard
fix type hints with tuple
What does this PR do?
Fix type hints in modeling_bert.BertSelfAttention.
Without , ... in Optional[Tuple[Tuple[torch.FloatTensor, ...], ...]], mypy would think the Tuple consists only one element and leads to Tuple index out of range [misc] error
Before submitting
- [x] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
- [ ] Did you read the contributor guideline, Pull Request section?
- [ ] Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
- [ ] Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
- [ ] Did you write any new necessary tests?
Who can review?
@ArthurZucker @Narsil @ydshieh
Hi @ZhiyuanChen
Thank you for opening the PR. @ArthurZucker this looks making sense to me. But we might need to change this in many places. Any comment regarding what to do in this PR? Like change any occurrence of
past_key_value: Optional[Tuple[Tuple[torch.FloatTensor]]]
to what has been done in this PR?
Hi @ydshieh
Thank you for your prompt response.
I'm not quite familiar with the transformers but as I read from the single model file philosophy, can the copying function (# Copied from <predecessor_model>.<function> ) work?
I also have a question... As the length of the past_key_value is fixed, may be we should use Optional[Tuple[Tuple[torch.FloatTensor, torch.FloatTensor]]] instead?
I'm not quite familiar with the
transformersbut as I read from the single model file philosophy, can the copying function (# Copied from <predecessor_model>.<function>) work?
Copied from
is a mark to state a place is copied from another one, and the check use this information to make sure they are indeed a copy (i.e. must be identical). It can be used to correct some copies in an automatically, but it depends on how we write those statements.
Optional[Tuple[Tuple[torch.FloatTensor, torch.FloatTensor]]]
There is at least one place to use ... I believe, as it depends on the number of hidden layers
Optional[Tuple[Tuple[torch.FloatTensor, torch.FloatTensor]]]
There is at least one place to use
...I believe, as it depends on the number of hidden layers
So... Optional[Tuple[Tuple[torch.FloatTensor, torch.FloatTensor], ...]] ?
I think so, but let's wait @ArthurZucker 's comment first.
Hey! SOrry I have no idea, @Rocketknight1 is the TYPING king! 🤗
Hi @ZhiyuanChen, you're right that we're quite inconsistent with this stuff. In a lot of our codebase, we don't strictly follow MyPy, and we use Tuple[x] to mean Tuple[x, ...].
If you want to make a PR to fix it, I think that's cool, but I think that PR should fix it across a lot of models at once - or else we'll have to individually review/merge 1000 different PRs for individual functions/classes! Also, note that the past_key_value argument passed to a single layer may have a different structure than the past_key_values argument passed to the whole model.
Hi @ZhiyuanChen, you're right that we're quite inconsistent with this stuff. In a lot of our codebase, we don't strictly follow MyPy, and we use
Tuple[x]to meanTuple[x, ...].If you want to make a PR to fix it, I think that's cool, but I think that PR should fix it across a lot of models at once - or else we'll have to individually review/merge 1000 different PRs for individual functions/classes! Also, note that the
past_key_valueargument passed to a single layer may have a different structure than thepast_key_valuesargument passed to the whole model.
Understood. I'll try to submit a pr next week
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
My apology, I was busy preparing for my final. I'll definitely update this week!
Optional[Tuple[Tuple[torch.FloatTensor, torch.FloatTensor]]]
There is at least one place to use
...I believe, as it depends on the number of hidden layersSo...
Optional[Tuple[Tuple[torch.FloatTensor, torch.FloatTensor], ...]]?
@ydshieh You are certainly right.
The type of past_key_value should be just Tuple[Tensor, Tensor], and the type of past_key_values should be Tuple[Tuple[Tensor, Tensor], ...].
Have you ever considered some tools like pre-commit to perform some additional static checks other than ruff?
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
@Rocketknight1 Could you please review this PR?
Also, I was wondering if you have plans to include MyPy and other checks like pre-commit.
I understand it is a huge task and can be extremely painful, but I'm happy to help.
Hi @ZhiyuanChen, I don't think we will ever achieve complete MyPy compatibility, unfortunately! Many of our functions are extremely polymorphic, and even enumerating all the possibilities in the return type would get really painful.
Also, I'm not sure I can review the PR yet - there's still a lot of conflicts in the merge itself, and some of the quality checks are failing! Try:
- Rebasing the PR onto the latest version of
transformers - Running
make styleto fix any issues with the formatting afterwards
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.