Antonio Stano
Antonio Stano
@dagelf i knew we could still go further with the cpu, thanks! looking into it
> So maybe this is ok to merge... 1 it looks a little funny is there no way to combine the double nested if into one condition? 2 i think...
did you check if you have OpenMP installed? ```bash # on macOS $ brew install libomp # on ubuntu $ sudo apt-get install libomp-dev ``` after installing build it again...
we have this error when the mean_loss is = -1.0f which is the initial value, and this attribute is only updeted in the gpt2_forward function when the target param is...
-Ofast already enables all the optimizations from -O3 and also other aggressive optimizations, maybe it's better to either use -O3 or -Ofast, not both?
@karpathy or maybe the optim could be custom made llvm passes targeting train_gpt2 and not automatically generated from the -O flags
The -Ofast flag enables the `-ffast-math` flag. The -ffast-math flag allows non-finite values and it can also ignore some mathematical rules that dictate that certain optimizations are not allowed. It's...
Ok I guess I found the problem, now -Ofast and -O3 enabled work togheter, the issue was caused by -ffast-math being enabled messing up with floating point arithmetic. This will...
@Infatoshi Yes, if `CFLAGS = -O3 -Ofast -fno-fast-math -Wno-unused-result`, using only `CFLAGS = -O3 -Wno-unused-result` without -0fast should work
Ok so from what i saw, and to recap, on ubuntu machines (i am on 22.04): ```Makefile CFLAGS = -O3 -Ofast -fno-fast-math -Wno-unused-result ``` will work. If this doesn't work...