llm.c icon indicating copy to clipboard operation
llm.c copied to clipboard

LLM training in simple, raw C/CUDA

Results 361 llm.c issues
Sort by recently updated
recently updated
newest added

This is a faster version of the cool new kernel from #117 (still /dev/cuda/ only). The biggest difference is it is optimised for doing one row per 1024-wide block rather...

This doesn't help us as is, but going forward, its a first step towards padding the vocab dimension to a sane value that actually allows for fast implementations. I haven't...

const keyword additions and one new file in the platform/windows directory called unistd.h.

Currently we only ever call `gpt2_forward` function with a single, fixed setting of `B,T`, for both training and inference, e.g.: ```c gpt2_forward(&model, gen_tokens, NULL, B, T); ``` However, in principle...

feature-request

I'm working on the C version of the code in preparation for (#40) So llm.c with **no** code modifications I observe the following: - `test_gpt2` works successfully and the loss...

Hi! Sometimes it's tricky or daunting to set up the hardware and environment for a script like this on a self-hosted cloud GPU. I was trying out llm.c on a...

I didn't see the implementation of backpropagation code in the train_gpt2.cu file, How does it compute gradients? ```c // do a training step clock_gettime(CLOCK_MONOTONIC, &start); dataloader_next_batch(&train_loader); gpt2_forward(&model, train_loader.inputs, train_loader.targets, B,...

A larger `thread_reuse_factor` reduces the number of threads launched while increasing the per-thread load. Depending on the value of `B * T * OC` and the GPU card, it is...

![image](https://github.com/karpathy/llm.c/assets/47049287/10a032a8-ec81-4649-b87f-fa3b17cd903b) Why do I encounter this problem? My cpu device is 13900kf,memory is 32GB.

How much GPU Ram do I need? I tried training on my GTX 1650 with 4GB or RAM. Batch size is already 4 meaning that's going to be difficult to...