Haoyan Luo

Results 3 issues of Haoyan Luo

# Description Before fix, `model_cache["ln_final.hook_normalized"]` will only return the RMS normalized hidden states without multiplying the `final_ln` weight. This might contractict with the design of this hook. I followed the...

no-rebase

Hi there, thanks for your work! I want to inquire about the source of the commonsense_15k dataset, as I didn't find it in the paper nor described in this repo.

Hi! Thank you for you interesting paper and its implementation! I have a few questions I hope you can clarify: 1. When employing the pre-trained model with a "sink token,"...