Alan Cooney
Alan Cooney
Thanks for looking into this! I guess the most efficient way would be to construct it once per model rather than once per head? However this would potentially break some...
Also pinged you directly with a potential hacky (buy more efficient) fix using a static property
Thanks - feel free to submit a PR for this!
Not yet I'm afraid. There's a task involved here to remove most of the manual device setting throughout the codebase (e.g. `to.(device=)` and `tensor([], device=`), as torch handles most of...
Agreed this would be great! Would you be interested in writing a PR for it?
If anyone wants to add a PR to improve the error message here that would be great
Will take a look at this whilst doing recursive DLA
Ah yes I assumed that the build action was something else - we can just reference this directly for the checks part (and probably rename it to checks). I'll create...
Sorry it hasn't been done yet. My personal opinion is that https://github.com/neelnanda-io/TransformerLens/blob/main/.github/workflows/release.yml represents best practice on this as a starting point (needs to be switched to use pip/setup tools though
> I started on the yaml config removal and as I did that it lead me to feel that we should reduce the need for instantiating classes by string,`train.trainer`/`optimizer.name`/`schedular.name` and...