Easy-Transformer Better docs for model properties

Make this table better and cover key info for model architecture - whether it uses parallel attn & MLPs, and what positional embedding it is.

Add text at the bottom documenting the models more qualitatively, can basically copy this glossary: https://docs.google.com/document/d/1WONBzNqfKIxERejrrPlQMyKqg7jSFW92x5UMXNrMdPo/edit#heading=h.chq47zvs9cii

I'd want to add a separate table with training info: include training dataset, number of tokens, whether they were trained with dropout, whether they have checkpoints, whether trained with weight decay.

Dec 19 '22 11:12 neelnanda-io

Bonus for doing this in a way that can be automatically updated as more models are added, but this is less important

Dec 19 '22 11:12 neelnanda-io

I guess you mean the Model Properties. I find the column names confusing and I agree it would need a proper clean-up. In particular, I am confused about the d_ and n_ prefixes. What is the difference between d_head and n_heads? I am also confused why attention-only models have non-zero d_mlp.

I am new to TranformerLens but would like to contribute. If you could point out resources to me that contain metadata about the models available in the package, I could get started.

Bonus for doing this in a way that can be automatically updated as more models are added, but this is less important

What is the release process for models? I can think of a solution if you give me some guidance

May 01 '23 08:05 geroldcsendes

d_heads is the dimension of the attention heads in multi-headed attention and n_heads is the number of heads per layer. Non-zero d_mlp is a bug I guess, you can turn off mlp without setting it to zero, not the biggest deal.

Models are added from hugging face typically and weights are accessed via api when we call from_pretrained. If you want to get a sense for what many of those parameters refer to, you could go here: https://neelnanda-io.github.io/TransformerLens/transformer_lens.html#module-transformer_lens.HookedTransformerConfig

The annotated transformer might be helpful too for a more general resource: https://nlp.seas.harvard.edu/2018/04/03/attention.html.

Don't know how to source more meta-data automatically if it isn't on hugging face.

May 01 '23 09:05 jbloomAus

This function will automatically get you the config. I just used it to automatically generate that janky table, I agree it could be much better! We can create a script that automatically generates the model table, and just ask people to run it as part of a new model PR.

As Joseph says d_ means the size of a dimension, n_ means the number of something (eg number of heads or layers). It'd be good to add a key for this at the top or something?

Other things I'd want to add:

Whether the model uses parallel attention (currently just the Pythia models and NeoX)
Whether the model was trained with dropout (currently just GPT-2)
Which tokenizer was used

May 01 '23 10:05 neelnanda-io

Currently working on this in https://github.com/mivanit/transformerlens-model-table, will make a PR once that is more cleaned up (and I get around mixtral gated repo issues)

Jun 12 '24 07:06 mivanit

Easy-Transformer Easy-Transformer copied to clipboard

Better docs for model properties

Easy-Transformer
Easy-Transformer copied to clipboard