Alan Cooney comments

Results 21 comments of


                                            Alan Cooney

Construct causal mask on-the-fly

Thanks for looking into this! I guess the most efficient way would be to construct it once per model rather than once per head? However this would potentially break some...

Construct causal mask on-the-fly

Also pinged you directly with a potential hacky (buy more efficient) fix using a static property

[Bug Report] Load model to mutilple devices

Thanks - feel free to submit a PR for this!

[Bug Report] Load model to mutilple devices

Not yet I'm afraid. There's a task involved here to remove most of the manual device setting throughout the codebase (e.g. `to.(device=)` and `tensor([], device=`), as torch handles most of...

[Proposal] Support Falcon

Agreed this would be great! Would you be interested in writing a PR for it?

[Bug Report] `load_and_process_state_dict` handles LayerNorm folding poorly

If anyone wants to add a PR to improve the error message here that would be great

[Proposal] Have ActivationCache.get_full_resid_decomposition support passing in a vector/tensor to project onto

Will take a look at this whilst doing recursive DLA

Add CD to publish PyPi Package

Ah yes I assumed that the build action was something else - we can just reference this directly for the checks part (and probably rename it to checks). I'll create...

Add CD to publish PyPi Package

Sorry it hasn't been done yet. My personal opinion is that https://github.com/neelnanda-io/TransformerLens/blob/main/.github/workflows/release.yml represents best practice on this as a starting point (needs to be switched to use pip/setup tools though

Remove dynamic imports

> I started on the yaml config removal and as I did that it lead me to feel that we should reduce the need for instantiating classes by string,`train.trainer`/`optimizer.name`/`schedular.name` and...