tinygrad
tinygrad copied to clipboard
Use local memory in kernels ($1000 bounty)
Claim the bounty by implementing this and having tinygrad generate GEMM kernels for NVIDIA that are faster than torch/cuBLAS.
Clean code only, must be merged to claim bounty.
Forgive me if I ask rudimentary question. During execution of -> tinygrad/accel/triton/ops_triton.py Error -> ImportError: cannot import name 'ExplicitExecAST' from 'tinygrad.ops' At here (tinygrad/tinygrad/ops.py) I can't find anything 'ExplicitExecAST'. What I am missing?
triton isn't supported anymore, you'd have to fix it.
@geohot Can you clarify if the goal is to call cuBLAS from tinygrad (e.g. with cupy) or custom GEMM kernel generation that is faster than cuBLAS? The former seems too straightforward, while the latter seems too complex
custom: tinygrad generate GEMM kernel using local memory, faster than cuBLAS/ pytorch.
Changes made in tinygrad/:
------------------------------------------------------------
files insertions deletions
------------------------------------------------------------
tinygrad/ast.py 6 0
tinygrad/llops/ops_gpu.py 218 31
tinygrad/runtime/cuda.py 9 6
tinygrad/runtime/metal.py 43 2
tinygrad/runtime/opencl.py 13 1
tinygrad/shape/__init__.py 1 1
------------------------------------------------------------
total 290 41
------------------------------------------------------------
lines added in the tinygrad folder: 249
So stale now.