massive-activations
massive-activations copied to clipboard
the standard deviation of the activation
Hello, I am interested in the standard deviation of the activation and would like to know how the variance is calculated. Here are a few methods:
- Calculate the variance for 100 sequences and display it for a specific layer in the table below.
- Calculate the variance for 100 sequences and the layers with relatively large values (e.g., layers 2-30).
- Calculate the variance for all layers.
Could you please specify which of the above situations applies?
Thanks.