Easy-Transformer
Easy-Transformer copied to clipboard
[Bug Report] Fix `n_params` counts
Describe the bug
The n_params
counts calculated here are wrong. For example, LLAMA uses SwiGLU so the 2x factor in the linked code is wrong. Further this just ignores bias parameters I think?
Code example
# Load in Llama-7B
llama.cfg.n_params # 5033164800 ...
System Info N/A
Additional context N/A
Checklist
- [ x ] I have checked that there is no similar issue in the repo (required)
Unclear what the solution should be.
There are plausibly three different parameter counts that are helpful:
- Parameters in training
- Parameters ignoring embeddings
- Parameters used now (e.g folding layer norm deletes some parameters)
I would appreciate people stating which parameter counts are most helpful to them
IMO this should be just total parameters for simplicity and alignment with the Pythia suite. Who cares about LayerNorm
On Tue, 14 Nov 2023, 8:09 pm ArthurConmy, @.***> wrote:
Unclear what the solution should be.
There are plausibly three different parameter counts that are helpful:
- Parameters in training
- Parameters ignoring embeddings
- Parameters used now (e.g folding layer norm deletes some parameters)
— Reply to this email directly, view it on GitHub https://github.com/neelnanda-io/TransformerLens/issues/448#issuecomment-1811155350, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASRPNKNC7SNW2N2YHGVXGDDYEPFXFAVCNFSM6AAAAAA7LMEBR2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMJRGE2TKMZVGA . You are receiving this because you are subscribed to this thread.Message ID: @.***>