qazal
qazal
- [x] Deeper realized_children search (r3 in Adam!) - Tests: #4321 #4342 #4348 - Scheduler changes: #4322 - [x] Simplify grouping - Tests: #4420 #4357 #4420 - Scheduler changes: #4491...
- [ ] Detect self-contained subgraphs - [ ] Delete forced_realize
I think we can generalize reduceop elementwise fusion to any LazyBuffer graph. This diff is a starting point towards E2_ kernels, fusing realized reducepos:  The main challenge is fusing...
results so far: `SAVE_SCHEDULE=1 python3 test/models/test_mnist.py TestMNIST.test_adam_onestep` `python3 examples/llm.c/export.py` todo: - [x] Assign - [ ] less lines
### how the trick works (currently) In the scheduler, we default to fusing padded view LazyBuffers. https://github.com/tinygrad/tinygrad/blob/794acefbf3c0dababb47632107e299e75e805577/tinygrad/engine/schedule.py#L137-L140 then LazyBuffers with `UNSAFE_PAD_OPS = {UnaryOps.RECIP, UnaryOps.LOG2, UnaryOps.EXP2, BinaryOps.IDIV}` get force realized. https://github.com/tinygrad/tinygrad/blob/master/tinygrad/engine/schedule.py#L137-L140...
const renderers have an extra call for rendering a vectorized const. This can simplify to: ``` # returns a str expression of the const with the given type def render_const(self,...
auto-inserts BARRIER in the uop toposort once all the STORE children of a LocalBuffer are toposorted. Unblocks #4957 - [x] insert BARRIER in uops - [ ] dedup UOps.IF gates...
The linearizer has extra complexity to support these. They need to integrate with UOpGraph. - [x] UOps.IF, everything should be a gated store that gets rewritten as an IF block...
in mean+stddev, softmax and layernorm, one reduceop builds up on its parent reduceop. tinygrad is making progress towards fusing these into a single kernel. ### Milestones 1. mean+stddev fusion -...
llm.c has two ways of allocating memory for a multi reduce kernel with GROUP: 1. split a single block between reduceops https://github.com/karpathy/llm.c/blob/master/llmc/layernorm.cuh#L165-L175 [reuse a split](https://github.com/karpathy/llm.c/blob/master/llmc/layernorm.cuh#L295) when we stored it to...