Chris Dryden

Results 68 comments of Chris Dryden

IOS requires the audio start connected to a part of the UI https://github.com/alemangui/pizzicato/issues/81

Figured out that I needed to change the block size dynamically based off of the value of C and the current block size and it is now around .0020ms faster!...

To start off, I will first implement the layernorm forward in the backwards pass implementation and use the ln1 and ln2 values directly from that layernorm forward to get an...

In the above PR I was able to implement the reduced memory: Went from this with recompute set to 1: ``` allocating 1439 MiB for activations val loss 4.503491 allocating...

The PR was merged but still needs the second step of making a simplified kernel that doesnt recompute everything and reuses the values calculated in the forwards pass

https://github.com/karpathy/llm.c/pull/319 this one adds the floatX to the dev cuda kernel for this

I am embarrassed, running this yesterday I was getting numbers that were closer to 600GB/s for both kernel 6 and kernel 9 throughput and around 900GB/s for kernel 8 throughput...

Hey @BurntSushi was hoping we could get your advice on the approach you would recommend adding the locale support for Ethopian and Thailand years using the locale env var. These...

There's a PR here that does some advanced stuff to get both Default SIGPIPE and SIGPIPE ignored to match GNU behavior but it does it in a totally different way.

Whoops meant to add the link to this pr with that comment https://github.com/uutils/coreutils/pull/9184 I don't think this implementation covers: Default SIGPIPE and SIGPIPE ignored