llm.c
llm.c copied to clipboard
[#243] Init From Scratch
Added gen_base_weights_checkpoint.py to create base weight checkpoints
Added -c option to train_gpt2.cu to overwrite load_filename value
This allows usage of generated base weight checkpoint instead of
weights outputted from train_gpt2.py.
We want to randomize and normalize the weights directly in C. So that no Python is required. So instead of build_from_check_point we'd want
pseudo:
init_from_scratch () { for every param : parm[x] = random_float([-1, +1]) }
then based on which layer it is need to normalize and scale again (sqrt). See _init_weights in nanoGDP. Basically we want the C version of _init_weights
Thanks for the feedback, I misunderstood the desired outcome but a clarification was added to the issue. I'll try this again in cuda only without any dependency on the HF model weights.
even with this being on the python side, if it allows us to create dummy models of different sizes, that would already be useful for profiling; see how the code scales to larger model sizes
@ngc92 You can pull a standalone script from if you need it
- gpt2-124M-from-scratch.py
Simply creates a new GPT-2 124M model from scratch and saves the corresponding weights to gpt2_124M.bin. Will be useful when full C/CUDA backprop is ready to try to train from scratch from C.