Chris Dryden
Chris Dryden
IOS requires the audio start connected to a part of the UI https://github.com/alemangui/pizzicato/issues/81
Figured out that I needed to change the block size dynamically based off of the value of C and the current block size and it is now around .0020ms faster!...
To start off, I will first implement the layernorm forward in the backwards pass implementation and use the ln1 and ln2 values directly from that layernorm forward to get an...
In the above PR I was able to implement the reduced memory: Went from this with recompute set to 1: ``` allocating 1439 MiB for activations val loss 4.503491 allocating...
The PR was merged but still needs the second step of making a simplified kernel that doesnt recompute everything and reuses the values calculated in the forwards pass
https://github.com/karpathy/llm.c/pull/319 this one adds the floatX to the dev cuda kernel for this
I am embarrassed, running this yesterday I was getting numbers that were closer to 600GB/s for both kernel 6 and kernel 9 throughput and around 900GB/s for kernel 8 throughput...
Hey @BurntSushi was hoping we could get your advice on the approach you would recommend adding the locale support for Ethopian and Thailand years using the locale env var. These...
There's a PR here that does some advanced stuff to get both Default SIGPIPE and SIGPIPE ignored to match GNU behavior but it does it in a totally different way.
Whoops meant to add the link to this pr with that comment https://github.com/uutils/coreutils/pull/9184 I don't think this implementation covers: Default SIGPIPE and SIGPIPE ignored