Hayden Prairie
Hayden Prairie
Running the following with a fresh conda environment with python 3.11. CPU 3970X Threadripper. GPU 7900 XTX. Ubuntu 22.04.1 . ``` DEBUG=3 python3 -c "from tinygrad import Tensor; N =...
### Problem Description After compiling a triton kernel it will run and work. However, if I create a kernel with the same name in another python file, then I will...
While srush has a creative way to do a roll function in Triton, I also thought I would open up an issue to see if this could potentially get implemented...
Hi, I was wondering if there were any plans to add padding support to global loads and store. This would bring the functionality of Thunderkittens a lot closer to Triton...
Hey, really awesome repo, I have been trying to setup a Thunder Kittens env, however, I am struggling as I am getting import errors with clangd. When including kittens, I...
Hey Tri and Albert, Digging through the code base, I have found a bug in the backward pass for the gradient calculations in the function `_chunk_scan_bwd_ddAcs_stable_kernel`. I think this solves...