llm.c icon indicating copy to clipboard operation
llm.c copied to clipboard

Pretraining (with CPUs)

Open bitmarkcc opened this issue 1 year ago • 5 comments

I'm new to deep learning but have some experience with training boosted-decision-trees.

Is this just for fine-tuning or pretraining as well? When I look inside train_gpt2.c I see the first thing it does is it loads weights from a bin file (gpt2_124M.bin). Where did this bin file come from? Is this an official file released by OpenAI? I would like to be able to start from scratch.

I would like to first see how pretraining works, even if it's just a small dataset, and it doesn't need to be GPUs. I would like to start with CPUs first, and maybe add CPU-only nodes that can work on 'parts' of the training.

bitmarkcc avatar Jul 01 '24 09:07 bitmarkcc

I see in train_gpt2.cu there is a gpt2_build_from_random() for training from scratch. I can attempt to copy that into the train_gpt2.c, but not sure how easy it will be. Any forks doing this?

What I would like to see is code that is platform independent (no reliance on Nvidia or AMD), though if people have those devices (or ASICs) they can use optimized code, but there should be a fallback to the platform independent code.

Edit: I think this will do mainly what I want. Though I need to add a way to pass the model and training parameters to the command line: https://github.com/bitmarkcc/llm.c/commit/bdff450a5cbd97a9a23b1013b0a8d27de7cb6e65

bitmarkcc avatar Jul 02 '24 09:07 bitmarkcc

Hey @bitmarkcc! Did you follow the README?

You should first run the Python code, it'll generate all the necessary bin/state files before you run C/CUDA code.

If something is not clearly explained in the README either open up a PR fixing it or reply back here, happy to help.

gordicaleksa avatar Jul 05 '24 20:07 gordicaleksa

Ya so according to the README, these can be generated with the train_gpt2.py and it references the official implementations of GPT-2 from OpenAI and HuggingFace. So these were generated from that python script? And if you run the C program it reproduces the same bin files?

In any case, I am still wondering if my code is fine for how I implemented pretraining for CPU mode (https://github.com/bitmarkcc/llm.c/commit/bdff450a5cbd97a9a23b1013b0a8d27de7cb6e65). I want to make more changes and I can put a pull request later on.

Edit: I think now it actually randomizes the parameters (2nd commit): https://github.com/bitmarkcc/llm.c/commit/7581695bc761ff6a755d0745b2530d54f1f4bd03

bitmarkcc avatar Jul 06 '24 08:07 bitmarkcc

@gordicaleksa I only have 4050 NVIDIA gpu(I have no A100 or v100), can the code be running on my gpu?

brisker avatar Jul 31 '24 16:07 brisker