Neel Nanda comments

Results 35 comments of


                                            Neel Nanda

Consider adding options for rapid tests

I think FNR of 42% significantly under-rates lateral flow tests. The thing I care about is not being infected, infectiousness increases with viral load, and lateral flow tests are _way_...

[Bug Report] HookedTranformer.generate() with model.tokenizer unset gives pad_token_id error

Interesting! What's the use case? Either way, it'd be an easy fix to just add an optional pad_token_id parameter to the generate function, feel free to make a PR

Add a helper function to display vectors of logits nicely

Thanks! I'll admit that those takes were too in depth for me to really get my head around them, but it sounded interesting and I would love to see a...

Add a helper function to display vectors of logits nicely

Ah, yes, if you're imagining a real interactive visualisation, putting it in CircuitsVis seems more natural. It's set up to be easy to integrate Javascript code and Python. On Tue,...

[Question] load_state_dict: copy vs assign

Interesting, I wasn't aware of that parameter, thanks! Looks like [it's new to torch 2.0](https://pytorch.org/docs/1.10.0/generated/torch.nn.Module.html?highlight=load_state_dict#torch.nn.Module.load_state_dict). I _think_ that TransformerLens doesn't require torch 2.0, so can't use it. On the other...

Fix to include ln_final.w in RMSNorm hook

I'm to pushback, but currently think that hook_normalized is working as intended, because it's invariant between folding layer norm On Sat, 26 Aug 2023, 5:15 pm Haoyan Luo, ***@***.***> wrote:...

[Bug Report] Can't add hook to pretrained model: AssertionError: Cannot add hook blocks.0.hook_q_input if use_split_qkv_input is False

Oh rip, that's probably because I explicitly wrote the caching in the get cache fwd and bwd function, and I think it just does every single hook in the model....

Neel Nanda

Consider adding options for rapid tests

[Bug Report] HookedTranformer.generate() with model.tokenizer unset gives pad_token_id error

Add a helper function to display vectors of logits nicely

Add a helper function to display vectors of logits nicely

[Question] load_state_dict: copy vs assign

Fix to include ln_final.w in RMSNorm hook

[Bug Report] Can't add hook to pretrained model: AssertionError: Cannot add hook blocks.0.hook_q_input if use_split_qkv_input is False

[Question] demo of 4bit quantized Llama -- what's next?

[Proposal] Optionally use flash attention.

Add mixed precision inference incl loading