zerok

Results 1 issues of zerok

I didn't see the implementation of backpropagation code in the train_gpt2.cu file, How does it compute gradients? ```c // do a training step clock_gettime(CLOCK_MONOTONIC, &start); dataloader_next_batch(&train_loader); gpt2_forward(&model, train_loader.inputs, train_loader.targets, B,...