Alex Loftus

Results 68 comments of Alex Loftus

@ebridge2 This can probably be closed? I think we are having the copy-editor do this

@dirkgr Here's a pretty basic check on this. I got the activations in every layer for a single prompt, then averaged over batch and hidden dimension to get the average...

I also plotted pairwise $\lambda$ values where $\lambda = \frac{1}{n_{\text{layers}}} \log \frac{||v'||}{||v||}$, v' is layer i, and v is layer i+1: as well as the case where v' is the...

> Why does the mean of the activations start below zero? The mean of the weights of every module in `self_attn` in the first layer is (slightly) negative, and there's...

idk if we need this dude i'm strongly against scope creeping at this point

Isn't the goal of Mojo to be a drop-in replacement for Python? How could that be achieved if Mojo is deviating from Python function names? I'd expect that most users...