llm.c issues

Optimised version of fused classifier + bugfixes(?)

This is a faster version of the cool new kernel from #117 (still /dev/cuda/ only). The biggest difference is it is optimised for doing one row per 1024-wide block rather...

ademeure

enable padding for softmax computation and fix up the existing kernels

1

This doesn't help us as is, but going forward, its a first step towards padding the vocab dimension to a sane value that actually allows for fast implementations. I haven't...

ngc92

Minimal changes to support Windows / Visual Studio

18

const keyword additions and one new file in the platform/windows directory called unistd.h.

rosslwheeler

bt-invariant inference

Currently we only ever call `gpt2_forward` function with a single, fixed setting of `B,T`, for both training and inference, e.g.: ```c gpt2_forward(&model, gen_tokens, NULL, B, T); ``` However, in principle...

karpathy

feature-request

The loss doesn't seem to converge after 1000 iterations

4

I'm working on the C version of the code in preparation for (#40) So llm.c with **no** code modifications I observe the following: - `test_gpt2` works successfully and the loss...

Yiltan

Easy Cloud Setup Script

Hi! Sometimes it's tricky or daunting to set up the hardware and environment for a script like this on a self-hosted cloud GPU. I was trying out llm.c on a...

dongreenberg

train_gpt2.cu have not backward ？

1

I didn't see the implementation of backpropagation code in the train_gpt2.cu file, How does it compute gradients? ```c // do a training step clock_gettime(CLOCK_MONOTONIC, &start); dataloader_next_batch(&train_loader); gpt2_forward(&model, train_loader.inputs, train_loader.targets, B,...

yuanrongxi

Include a factor in add bias kernel of matmul for perf tuning

A larger `thread_reuse_factor` reduces the number of threads launched while increasing the per-thread load. Depending on the value of `B * T * OC` and the GPU card, it is...

lancerts

concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.

![image](https://github.com/karpathy/llm.c/assets/47049287/10a032a8-ec81-4649-b87f-fa3b17cd903b) Why do I encounter this problem? My cpu device is 13900kf，memory is 32GB.

xuanyuandy

[CUDA ERROR] at file train_gpt2.cu:693: out of memory

2

How much GPU Ram do I need? I tried training on my GTX 1650 with 4GB or RAM. Batch size is already 4 meaning that's going to be difficult to...

nyck33

llm.c
llm.c copied to clipboard

Metadata

Optimised version of fused classifier + bugfixes(?)

enable padding for softmax computation and fix up the existing kernels

Minimal changes to support Windows / Visual Studio

bt-invariant inference

The loss doesn't seem to converge after 1000 iterations

Easy Cloud Setup Script

train_gpt2.cu have not backward ？

Include a factor in add bias kernel of matmul for perf tuning

concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.

[CUDA ERROR] at file train_gpt2.cu:693: out of memory

← Metadata

Owner

Metadata

llm.c llm.c copied to clipboard

Metadata

← Metadata

Owner

Metadata

llm.c
llm.c copied to clipboard