Azret Botash comments

Results 29 comments of


                                            Azret Botash

from-scratch init the model

see https://github.com/karpathy/llm.c/pull/156

Gain another 10-20%+ on CPU performance on gcc by moving -fno-finite-math-only to only gelu_backwards

Please don't forget about the MSVC/Windows. MSVC uses pragma to turn off the optimization. #pragma optimize( "", off ) /* unoptimized code section */ #pragma optimize( "", on ) This...

Gain another 10-20%+ on CPU performance on gcc by moving -fno-finite-math-only to only gelu_backwards

> > Please don't forget about the MSVC/Windows. MSVC uses pragma to turn off the optimization. > > #pragma optimize( "", off ) /* unoptimized code section */ #pragma optimize(...

Gain another 10-20%+ on CPU performance on gcc by moving -fno-finite-math-only to only gelu_backwards

Lookup tables are a great idea

CPU: atomicAdd(float*)

Would you be able to port it to atomic_compare_exchange_weak?

CPU: atomicAdd(float*)

``` void atomic_add(_Atomic float* dest, float val) { float old = *dest; float new_value; do { new_value = old + val; } while (!atomic_compare_exchange_weak(dest, &old, new_value)); } ``` Does not...

CPU: atomicAdd(float*)

Try this: void test() { // verify basic case.... expect about ~ 1.0 float sum = 0.999991; atomicAdd(&sum, 0.000009); int B = 1024; int T = 1024; int C =...

CPU: atomicAdd(float*)

We might need atomic accumulation in the back pass if the number of collisions to the same bucket is significant that it will screw up the gradient. And I say...

Should we use 'T' instead of 'BT' here? I'm not sure either

Yeah I think so. I was looking at the same thing trying to wrap my head around why the matmult does not report any errors when **int b = bt...

Should we use 'T' instead of 'BT' here? I'm not sure either

The term in bold gets canceled out. float* x = _In + **b * T * C** + t * C; float* y = _Out + **b * T *...