Andy Arditi issues

Repositories
Issues
Comments

Results 2 issues of


                                            Andy Arditi

Construct causal mask on-the-fly

# Description Previously, we were allocating causal masks of size `(n_ctx, n_ctx)` for every instantiation of `AbstractAttention`, where `n_ctx` corresponds to the _maximum_ context length. For models with a large...

[Proposal] Memory efficient causal mask implementation

### Proposal [Relatively minor proposal - considered making it a bug, but it's not *really* a bug.] In the initialization of each `Attention` module, we [register](https://github.com/neelnanda-io/TransformerLens/blob/ce82675a8e89b6d5e6229a89620c843c794f3b04/transformer_lens/components.py#L440C9-L440C20) a `causal_mask` buffer. This...