Rusheb Shah comments

Results 9 comments of


                                            Rusheb Shah

Add tests + better docs for FactoredMatrix

Hello, I would like to help with adding unit tests. I may find it hard to make all the changes all at once, and it would be good to work...

Add tests + better docs for FactoredMatrix

I think that all the tests for this are actually done as of #218. @jbloomAus maybe we should close this ticket and spin out a new one for docs.

[Proposal] BERT: Future work

Hi Tim, I think the work is up for grabs if you are keen. Probably worth checking with @jbloomAus what his priorities are and spinning out a ticket for anything...

[Bug Report] GPU Memory is leaked when HookedTransformer goes out of scope.

Hi @tbenthompson, I've reproduced this on TransformerLens v1.2.2 What version of TransformerLens did you use to produce this? I'm just trying to narrow down how long this has been an...

[Bug Report] GPU Memory is leaked when HookedTransformer goes out of scope.

Have been investigating this with @pranavgade20 today. We haven't got to the bottom of the issue, but leaving some notes here for anybody to pick up ## Circular reference The...

[Bug Report] GPU Memory is leaked when HookedTransformer goes out of scope.

I couldn't repro the issue on CPU on OS X M1. Confusingly, the 410m TransformerLens model seems to be using up basically no RAM. **EDIT: I ran it for longer...

[Bug Report] GPU Memory is leaked when HookedTransformer goes out of scope.

OK, after playing around with various tools, I ran the [fil profiler](https://pythonspeed.com/articles/python-server-memory-leaks/). As you can see in the screenshot below, the process is allocating nearly 15GB of memory to load...

[Bug Report] GPU Memory is leaked when HookedTransformer goes out of scope.

Update! I've managed to get the memory usage down from 15GiB to 5GiB in my test. If you compare the diagram below with the one from my previous post, all...

[Bug Report] GPU Memory is leaked when HookedTransformer goes out of scope.

Unfortunately when I reran @tbenthompson's reproducing example, the GPU usage is unchanged. Seems like this is a real issue, but it might be a separate one