Easy-Transformer [Proposal] BERT: Future work

Proposal

This ticket documents enhancements to BERT support in TransformerLens, following on from issue #258, which were out of scope for the MVP (PR #276).

These items can be prioritised and spun into separate tickets as necessary.

[ ] Expand the demo notebook (demos/BERT.ipynb). The notebook should include
- a runthrough of the BERT architecture and what makes it different to GPT
  - how the components are different, e.g. TokenType embedding, Post-LayerNorm
  - the masked language modelling (MLM) task and how it differs from causal language modelling
  - notes about loss, why it doesn't make sense for HookedEncoder.forward to return the loss
- examples of using HookedEncoder to do the same types of things people would do with HookedTransformer, highlighting similarities and differences
[ ] User-test BERT support by using it to do proper interpretability research
- [ ] Add examples of research using BERT to the demo notebooks
[ ] Add support for different tasks, e.g. Next Sentence Prediction, Causal Language Modelling
[ ] Add more models: bert-base-uncased, bert-large-cased, bert-large-uncased
[ ] Add preprocessing of weights including LayerNorm folding
[ ] Accept strings as input and add tokenization helpers from HookedTransformer
[ ] Add support for training/finetuning (most notably, dropouts)
[ ] Add tests for HookedEncoder convenience properties (e.g. W_U, b_u, W_E, etc)

Checklist

[x] I have checked that there is no similar issue in the repo (required)

May 19 '23 09:05 rusheb

@rusheb what's the status on this?

I'd love to contribute to this..

Sep 13 '23 08:09 timsankara

Hi Tim, I think the work is up for grabs if you are keen. Probably worth checking with @jbloomAus what his priorities are and spinning out a ticket for anything you decide to work on.

I'm happy to discuss specifics if that would be helpful.

Sep 13 '23 14:09 rusheb

Easy-Transformer Easy-Transformer copied to clipboard

[Proposal] BERT: Future work

Proposal

Checklist

Easy-Transformer
Easy-Transformer copied to clipboard