Azret Botash
Azret Botash
Same error with 8192 [CUDA ERROR] at file D:\SRC\llm.c\train_gpt2_fp32.cu:1079: too many resources requested for launch ```c void fused_classifier3(float* logits, float* losses, const float* dlosses, const int* targets, int B, int...
The __launch_bounds__(1024, 1) bypassed "the too many resources error". However, all the losses go kaboom with block_size = 1024. But everything passeswith block_size = 32 ``` // With block_size=1024 allocated...
I'm running fp32 btw. B = 4, T = 64. So pretty small batch.
It's Windows.
This is cool. Though it fails for me. Is there a cross platform lib that could do the unzip instead of calling an external process? ``` # unzip the file...
Thank you for the update. It now works on Windows as is.
Try to rebuild your data files with the train_gpt3.py. The tokenizer headers have changed.
We want to randomize and normalize the weights directly in C. So that no Python is required. So instead of build_from_check_point we'd want pseudo: init_from_scratch () { for every param...
@ngc92 You can pull a standalone script from if you need it [gpt2-124M-from-scratch.py](https://github.com/karpathy/llm.c/pull/156/files#diff-2157512e3cabc9347c7220b8f9112e87da9a8adc02b666d1e5438f9b04ad5b5d) 1. gpt2-124M-from-scratch.py Simply creates a new GPT-2 124M model from scratch and saves the corresponding weights to...