Andrej

Results 373 comments of Andrej

Confirm this fixes the issue, stepping at ~460K tok/s on 4XA100 GPUs.

Hello ty for the PR, I'm not an expert in cudnn use do you have a short explanation for some of these changes? Also I noticed you edited the dev/cuda...

Running the test with this PR `make test_gpt2cu USE_CUDNN=1 && ./test_gpt2cu` actually fails, and specifically the error on `qkvw` tensor grows from 1.1e-1 to 1.4e-1. So we'd have to dumb...

This was not flagged by our CI because I think it does not turn on `USE_CUDNN=1` in the `make` command.

Sorry for spam, I noticed that it's not this PR that is "flipping" the test from FAIL to PASS, it's the way we compile, without the use of `USE_CUDNN=1`. Master...

Very cool! I'll take a look and also see if I can find a slurm cluster to play with this on. Do you by any chance have a PyTorch baseline...

Thank you for posting @chinthysl , very cool. We had a small discussion about it on our Discord with the core devs, please join us sometime on the [CUDA MODE](https://github.com/cuda-mode)...

Nice! This is actually super convenient because it may mean that we could have tests for our training matching that of PyTorch from scratch, without having to save/load checkpoints. We...

Hi @jart it's nice to see you stop by! I don't think I can merge this because (for educational and historic reasons) I am trying to be compatible with GPT-2...