llm.c [#243] Init From Scratch

[#243] Init From Scratch

Open Neruelin opened this issue 9 months ago • 4 comments

Added gen_base_weights_checkpoint.py to create base weight checkpoints
Added -c option to train_gpt2.cu to overwrite load_filename value
    This allows usage of generated base weight checkpoint instead of
    weights outputted from train_gpt2.py.

Apr 29 '24 10:04 Neruelin

We want to randomize and normalize the weights directly in C. So that no Python is required. So instead of build_from_check_point we'd want

pseudo:

init_from_scratch () { for every param : parm[x] = random_float([-1, +1]) }

then based on which layer it is need to normalize and scale again (sqrt). See _init_weights in nanoGDP. Basically we want the C version of _init_weights

Apr 29 '24 16:04 azret

Thanks for the feedback, I misunderstood the desired outcome but a clarification was added to the issue. I'll try this again in cuda only without any dependency on the HF model weights.

Apr 29 '24 17:04 Neruelin

even with this being on the python side, if it allows us to create dummy models of different sizes, that would already be useful for profiling; see how the code scales to larger model sizes

Apr 29 '24 19:04 ngc92

@ngc92 You can pull a standalone script from if you need it

gpt2-124M-from-scratch.py

gpt2-124M-from-scratch.py

Simply creates a new GPT-2 124M model from scratch and saves the corresponding weights to gpt2_124M.bin. Will be useful when full C/CUDA backprop is ready to try to train from scratch from C.

Apr 29 '24 19:04 azret

llm.c llm.c copied to clipboard

[#243] Init From Scratch

llm.c
llm.c copied to clipboard