Easy-Transformer
Easy-Transformer copied to clipboard
[Proposal] BERT: Future work
Proposal
This ticket documents enhancements to BERT support in TransformerLens, following on from issue #258, which were out of scope for the MVP (PR #276).
These items can be prioritised and spun into separate tickets as necessary.
- [ ] Expand the demo notebook (
demos/BERT.ipynb
). The notebook should include- a runthrough of the BERT architecture and what makes it different to GPT
- how the components are different, e.g. TokenType embedding, Post-LayerNorm
- the masked language modelling (MLM) task and how it differs from causal language modelling
- notes about loss, why it doesn't make sense for HookedEncoder.forward to return the loss
- examples of using HookedEncoder to do the same types of things people would do with HookedTransformer, highlighting similarities and differences
- a runthrough of the BERT architecture and what makes it different to GPT
- [ ] User-test BERT support by using it to do proper interpretability research
- [ ] Add examples of research using BERT to the demo notebooks
- [ ] Add support for different tasks, e.g. Next Sentence Prediction, Causal Language Modelling
- [ ] Add more models: bert-base-uncased, bert-large-cased, bert-large-uncased
- [ ] Add preprocessing of weights including LayerNorm folding
- [ ] Accept strings as input and add tokenization helpers from HookedTransformer
- [ ] Add support for training/finetuning (most notably, dropouts)
- [ ] Add tests for HookedEncoder convenience properties (e.g.
W_U
,b_u
,W_E
, etc)
Checklist
- [x] I have checked that there is no similar issue in the repo (required)
@rusheb what's the status on this?
I'd love to contribute to this..
Hi Tim, I think the work is up for grabs if you are keen. Probably worth checking with @jbloomAus what his priorities are and spinning out a ticket for anything you decide to work on.
I'm happy to discuss specifics if that would be helpful.