Add missing type hints

Open Rocketknight1 opened this issue 2 years ago • 124 comments

This issue is part of our Great Code Cleanup 2022. If you're interested in helping out, take a look at this thread, or come join us on Discord and talk with other contributors!

🚀 Add missing type hints

Type hints are used inconsistently in the transformers repo across both TF and PT models, and it'd be nice to make them a complete, consistent thing for the core models, especially because we want to develop features that depend on them!

Guide to contributing:

Ensure you've read our contributing guidelines 📜
Claim your architecture(s) in this thread (ensure no one is working on it). It's 100% okay to only take the TensorFlow or PyTorch version of a model, if you're not familiar with both frameworks! It's also okay to claim multiple models and group those changes into a single PR! 🎯
Implement the changes as in https://github.com/huggingface/transformers/pull/16057 or https://github.com/huggingface/transformers/pull/16074 (see the diff on the model architectures for a few examples) 💪
Open the PR and tag me in it. You should run make fixup at the end to do a code quality check before your final commit!

Tips for making your PR

The files you need to edit will be in src/transformers/models/[model_name]/
For TensorFlow, you want the modeling_tf_[model_name].py file. For PyTorch, you want the modeling_[model_name].py file.
Remember, you do not have to cover every class in that file!. The main thing we want to cover is the call (for TF) or forward (for PT) method for user-facing classes like TFRobertaForMaskedLM or RobertaForSequenceClassification. It's not necessary to add type hints to layers or base classes like RobertaModel or TFRobertaPreTrainedModel - these are trickier to write, and generally people do not use those classes as standalone models.
If you're unfamiliar with how type hints work, you can read the Python library documentation on them, but it's probably even easier to just look at another PR that added them. Take a look at the list of changes in the pull requests linked above!
The types will usually be obvious - most inputs are Optional[Union[np.ndarray, tf.Tensor]] for TF models and Optional[torch.Tensor] for PyTorch models, and boolean inputs are Optional[bool]. Pay attention to the first input of TF models, though, which is usually TFModelInputType - this is because Keras handles that first input in a special way! Other inputs to pay attention to are past_key_values, which can vary between models, and also the model output type. For the base model classes like RobertaModel, you may have to look at the corresponding MainLayer to figure out the right output type! Also, note that the output type may be a tuple if return_dict is False, in which case you should specify Union[Tuple, ...]. Finally, note that in TF models, training is never None, so it should be training: bool and not training: Optional[bool].
Note that some code is copied across our codebase. If you see a line like # Copied from transformers.models.bert..., this means that the code is copied from that source, and our scripts will automatically keep that in sync. If you see that, you should not edit the copied method! Instead, edit the original method it's copied from, and run make fixup to synchronize that across all the copies. Be sure you installed the development dependencies with pip install -e ".[dev"], as described in the contributor guidelines above, to ensure that the code quality tools in make fixup can run.

How can I find models that need type hints?

I used to maintain a list here, but it got out of date, I'm sorry. Instead, you can use this Colab notebook. If you run this, it will show you models in PyTorch or TF that are still missing type hints. Unlike my manually curated lists, it's guaranteed to be up to date - but do double-check that someone else in the thread hasn't claimed a model before you start, because the Colab code will only register type hints after the PR containing them is merged!

Mar 10 '22 19:03 Rocketknight1

I would love to work on PyTorch Albert🚀

Mar 11 '22 17:03 divyanshugit

Hi, I would like to work on PyTorch ImageGPT

Mar 11 '22 17:03 johnnv1

Hi, I would like to work on CamemBERT for PT & TF.

I will take a look at LayoutLMv2 after the first one :smiley:

Edit: Because CamemBert depends on Roberta I will take PyTorch Roberta :+1:

Mar 11 '22 17:03 chainyo

Hello!

I'd like to take Hubert & Wav2Vec2 for Pytorch.

Cheers!

Mar 11 '22 17:03 Vaibhavs10

I'll try PyTorch BERT to start!

Mar 11 '22 17:03 johnryan465

@johnryan465 I just did it as an example, I'm sorry! I'm marking off the completed models now.

Mar 11 '22 17:03 Rocketknight1

@Rocketknight1 no worries, will try and do DistillBert instead

Mar 11 '22 17:03 johnryan465

I'd like to work on GPT2 (TF).

Mar 11 '22 17:03 cakiki

@Rocketknight1 I switch to Roberta PyTorch because CamemBERT depends on Roberta modeling

Mar 11 '22 17:03 chainyo

Awesome! Hey @Rocketknight1 – I'd like to work on Longformer for both PyTorch & TF!

Mar 11 '22 17:03 johnnygreco

I'd like to work on BigBird

Mar 11 '22 18:03 tanmoyio

I would like to work on Clip for pytorch.

Mar 11 '22 19:03 jacobdineen

Also, will work on BeiT, Deit and ViT (Pytorch)

Mar 11 '22 19:03 johnnv1

I can work on ImageGPT.

Mar 11 '22 21:03 bhavika

I can work on Swin (Pytorch)

Mar 11 '22 22:03 omer-dor

I'd like to work on XLM (Tensorflow)

Mar 11 '22 23:03 elusenji

I'll take T5 (Tensorflow)!

Mar 11 '22 23:03 Dahlbomii

I'd like to claim GPT-2 (PyTorch).

Mar 12 '22 00:03 KristijanArmeni

Hi @Rocketknight1,

I would like to work on BART of both TF and PyTorch

Mar 12 '22 03:03 robotjellyzone

ELECTRA TF - https://github.com/huggingface/transformers/pull/16104 ELECTRA PT - https://github.com/huggingface/transformers/pull/16103 DeBERTA PT - https://github.com/huggingface/transformers/pull/16105

Mar 12 '22 06:03 kamalkraj

XLMRobertaXL (PyTorch)

Mar 12 '22 07:03 manandey

segformer pytorch

Mar 12 '22 08:03 p-mishra1

I'll take OpenAIGPT!

Mar 12 '22 09:03 TristanBilot

Hi @Rocketknight1,

I would like to work on BART of both TF and PyTorch

can you please confirm with emoji whether i am eligible to take these or not? @Rocketknight1

Mar 12 '22 12:03 robotjellyzone

I will work on XLM (PyTorch)

Mar 12 '22 13:03 jbrry

@robotjellyzone You can! Please note that we accepted a PR yesterday to add the TF decorator to BART, so make sure you're working on the most recent version of the library before you start your PR!

Mar 12 '22 14:03 Rocketknight1

I'll take Distilbert (TensorFlow)

Mar 12 '22 16:03 PepijnBoers

Happy to take T5 (PyTorch)

@Rocketknight1 isn't the list missing ConvNext? If so, I'm happy to take care of that one too :ok_hand:

Mar 12 '22 17:03 frgfm

I'll work on GPTJ

Mar 12 '22 17:03 tmastrom

@robotjellyzone You can! Please note that we accepted a PR yesterday to add the TF decorator to BART, so make sure you're working on the most recent version of the library before you start your PR!

OK sure! I will keep this in mind 😊👍...

Mar 12 '22 18:03 robotjellyzone

transformers transformers copied to clipboard