Andrej

Results 373 comments of Andrej

I don't think I'll add cmake to the project. What is cmake fixing here that can't be done in our Makefile?

This sounds cool, I guess you only tried for the little shakespeare training run, i wonder if the slight accuracy decrease could cause training instabilities, probably should try a bigger...

So: - the weights are quantized once during model export - the data (activations) are quantized dynamically on demand during forward pass - however i'd expect not all layers are...

Sorry I don't understand the history/context for this change, is it following up on some conversation? Why are the args being changed around?

Code works accidentally because 21 > 3 right?

Hi, this looks ok to me. - the coding style in kernel6 is a bit off, e.g. i think you're using black? the cropping of the lines and such looks...

Hi @KarhouTam we just merged a LayerNorm forward, I'm not 100% sure how this version is similar or different now.

Ok yes this is probably a good idea 😅 . I'll leave some comments.

One more thing to be careful with and think about 🤔 . If the process crashes or hangs and gets restarted, it starts to log again from the last checkpoint....

I don't want to bloat the main README file too much. Maybe we can have extra helpful instructions inside our `doc/` folder and link to it or something