Haoyan Luo
Haoyan Luo
# Description Before fix, `model_cache["ln_final.hook_normalized"]` will only return the RMS normalized hidden states without multiplying the `final_ln` weight. This might contractict with the design of this hook. I followed the...
Hi there, thanks for your work! I want to inquire about the source of the commonsense_15k dataset, as I didn't find it in the paper nor described in this repo.
Hi! Thank you for you interesting paper and its implementation! I have a few questions I hope you can clarify: 1. When employing the pre-trained model with a "sink token,"...