Andrej comments

Results 373 comments of


                                            Andrej

Evaluation script for Huggingface Causal models

Also another reason this code will fail is that it hardcodes max context length to be 2048: ```python while input_ids.shape[-1] > 2048: ``` but e.g. GPT-2 has max context length...

Evaluation script for Huggingface Causal models

I noticed one more bug The issue is that there is a double space in the intro line, right before the subject. This is because ```python def format_subject(subject): l =...

Would like to contribute FSDP functionality

It's def on my todo list to incorporate FSDP into nanoGPT but I haven't looked into it in detail just yet. I also know that FSDP internals are being actively...

Fix the potential int overflow related to BTV

In lines like ` const size_t N = (size_t)(B) * T * V;` is the explicit cast needed?

Fix the potential int overflow related to BTV

Sorry I meant the casts look ugly to my eye. Maybe we could make the individual params `size_t` in the function declarations 🤔, so their products will come out `size_t`...

Bump json5 and webpack-cli in /assets

oops this PR now conflicts because I merged the other one. Sounds good, agree it is ok to skip += here, but I think it should come with a comment...

Fix the bug that yields cpu, gpu results mismatch in crossentropy_softmax_backward.cu

we can't just malloc on repeat, without free. maybe memset to zero if needed?

Further improvements to attention backward

I merged the previous PR, so this one should be ready. ACK on using `=` instead of `+=` in the backward pass. I didn't even realize originally that this would...

Further improvements to attention backward

Also one possible request - I think a lot of people will come dev/cuda to learn CUDA. If you're able to comment some of the kernels I think it could...

Further improvements to attention backward

So cool, I went down from 400ms/iter ->200ms/iter.