Azret Botash
Azret Botash
I'm working on Windows and see the same with the MSVC. I use /O2 and /fp:fast. I think I've nailed it down to the GELU layer. Something is being over...
Another thing to note is that if compiling without fast fp one of the check tensors fails compared to pytorch version. It's very small delta but def. goes away when...
re: **/openmp** on Windows @ross-wheeler unfortunately Microsoft's compiler is pretty bad at this and we need to do some cosmetic changes. e.g. ```c void matmul_forward(float* out, float* inp, float* weight,...
**Test run 1** **cl.exe /Ox /fp:fast /I. /I .\dev\win train_gpt2.c** D:\Repos\llm.c>train_gpt2.exe [GPT-2] max_seq_len: 1024 vocab_size: 50257 num_layers: 12 num_heads: 12 channels: 768 num_parameters: 124439808 train dataset num_batches: 1192 val dataset...
Now re: **/openmp**: ``` D:\Repos\llm.c>cl.exe /Ox /fp:fast /openmp /I. /I .\dev\win train_gpt2.c Microsoft (R) C/C++ Optimizing Compiler Version 19.39.33523 for x64 Copyright (C) Microsoft Corporation. All rights reserved. train_gpt2.c train_gpt2.c(163):...
I removed parallelization on every layer except matmul_forward and got 7 seconds. ``` D:\Repos\llm.c>cl.exe /openmp:llvm /fp:fast /Ox /I. /I .\dev\win train_gpt2.c Microsoft (R) C/C++ Optimizing Compiler Version 19.39.33523 for x64...
@rosslwheeler thank you very much. I have tried you project files. Very nice setup. They are very useful when debugging. Unfortunately it's not about the projects files. There is only...
The CUDA backprop is not ready yet. Andrej is still working on it. We're all waiting :)
> OpenMP Yeah. We'll need add the OpenMP option as well. For now just copy this files locally and add it in the build_msvc.bat. e.g. cl.exe /openmp /O2 /fp:fast /I....
Closing this. See #19