Easy-Transformer
Easy-Transformer copied to clipboard
Better docs for model properties
Make this table better and cover key info for model architecture - whether it uses parallel attn & MLPs, and what positional embedding it is.
Add text at the bottom documenting the models more qualitatively, can basically copy this glossary: https://docs.google.com/document/d/1WONBzNqfKIxERejrrPlQMyKqg7jSFW92x5UMXNrMdPo/edit#heading=h.chq47zvs9cii
I'd want to add a separate table with training info: include training dataset, number of tokens, whether they were trained with dropout, whether they have checkpoints, whether trained with weight decay.
Bonus for doing this in a way that can be automatically updated as more models are added, but this is less important
I guess you mean the Model Properties. I find the column names confusing and I agree it would need a proper clean-up. In particular, I am confused about the d_
and n_
prefixes. What is the difference between d_head
and n_heads
? I am also confused why attention-only models have non-zero d_mlp
.
I am new to TranformerLens but would like to contribute. If you could point out resources to me that contain metadata about the models available in the package, I could get started.
Bonus for doing this in a way that can be automatically updated as more models are added, but this is less important
What is the release process for models? I can think of a solution if you give me some guidance
d_heads is the dimension of the attention heads in multi-headed attention and n_heads is the number of heads per layer. Non-zero d_mlp is a bug I guess, you can turn off mlp without setting it to zero, not the biggest deal.
Models are added from hugging face typically and weights are accessed via api when we call from_pretrained
. If you want to get a sense for what many of those parameters refer to, you could go here: https://neelnanda-io.github.io/TransformerLens/transformer_lens.html#module-transformer_lens.HookedTransformerConfig
The annotated transformer might be helpful too for a more general resource: https://nlp.seas.harvard.edu/2018/04/03/attention.html.
Don't know how to source more meta-data automatically if it isn't on hugging face.
This function will automatically get you the config. I just used it to automatically generate that janky table, I agree it could be much better! We can create a script that automatically generates the model table, and just ask people to run it as part of a new model PR.
As Joseph says d_
means the size of a dimension, n_
means the number of something (eg number of heads or layers). It'd be good to add a key for this at the top or something?
Other things I'd want to add:
- Whether the model uses parallel attention (currently just the Pythia models and NeoX)
- Whether the model was trained with dropout (currently just GPT-2)
- Which tokenizer was used
Currently working on this in https://github.com/mivanit/transformerlens-model-table, will make a PR once that is more cleaned up (and I get around mixtral gated repo issues)