llm.c train_gpt2.cu have not backward ？

train_gpt2.cu have not backward ？

Open yuanrongxi opened this issue 10 months ago • 1 comments

I didn't see the implementation of backpropagation code in the train_gpt2.cu file, How does it compute gradients?

        // do a training step
        clock_gettime(CLOCK_MONOTONIC, &start);
        dataloader_next_batch(&train_loader);
        gpt2_forward(&model, train_loader.inputs, train_loader.targets, B, T);
        // gpt2_zero_grad(&model);
        // gpt2_backward(&model);
        // gpt2_update(&model, 1e-4f, 0.9f, 0.999f, 1e-8f, 0.0f, step+1);
        cudaCheck(cudaDeviceSynchronize()); // finish all CUDA work to get correct precise timings
        clock_gettime(CLOCK_MONOTONIC, &end);
        double time_elapsed_s = (end.tv_sec - start.tv_sec) + (end.tv_nsec - start.tv_nsec) / 1e9;
        printf("step %d: train loss %f (took %f ms)\n", step, model.mean_loss, time_elapsed_s * 1000);

Apr 15 '24 11:04 yuanrongxi

The CUDA backprop is not ready yet. Andrej is still working on it. We're all waiting :)

Apr 15 '24 15:04 azret

The new version already has the backward.

Apr 21 '24 15:04 yuanrongxi

llm.c llm.c copied to clipboard

train_gpt2.cu have not backward ？

llm.c
llm.c copied to clipboard