massive-activations
massive-activations copied to clipboard
Code accompanying the paper "Massive Activations in Large Language Models"
Hi ! Interesting work on the role of explicit bias! I was wondering what training settings got you an eval PPL ~3.04. The paper mentions that 50K iterations are required...
interesting work! i have a question as it in the title, do you conducte an experiment like that? what's the result? thanks.
1. How to get the mean value of massive activation?e.g. 2546.8/-1502.0 in hook.py 2. Mean value is still large, what is the difference between using the mean value and using...
an -> a
Hello, This is great work! And I wonder about the layer that the analyzed activations are from. The last layer?
Hello, I am interested in the standard deviation of the activation and would like to know how the variance is calculated. Here are a few methods: 1. Calculate the variance...