qazal
qazal
This diff #4030 deletes the "can only have one output buffer" constraint - ```py # can only have one output buffer # can only reduce contiguous # max one reduceop...
I think JIT should use Tensor.corealize to let the scheduler fuse outputs, the user can .realize() to opt-out of the fusion. will enable the tests once multioutput is merged.
`FUZZ_SCHEDULE=1 DEBUG=2 python3 test/test_multitensor.py TestMultiTensor.test_simple_add_X` need to take more time with: - [x] What is the right abstraction? fuzzer.py could merge with realize.py -> moved to /tests - [x] Seedless`.randn`...
Trains beautiful_mnist with ~25% less kernels. this diff: https://tiny-tools-client.vercel.app/?id=2edc5abc69f74cc1ae5eb3b25a9ac292 master: https://tiny-tools-client.vercel.app/?id=0c8240da3a9b4eef8759987ef3df4708
Tested by fuzzing stdout of `for si in schedule: print(si.outputs[0])` in `test/test_multitensor.py TestMultiTensor.test_simple_add_X `https://gist.github.com/Qazalin/cd7d88ba1b221ed58b46ea4a091f3a89 There can be multiple valid topological sorts of a DAG. In this graph https://tiny-tools-client.vercel.app/?id=785de65a9e5246ae9c6f651a3f96d453 Any ordering...
`FUZZ_UOPS=1` sometimes generates wrong toposorts. These should be builtin to the uops.py graph: - [x] DEFINE_ACC comes before the outer loop #4656 - [ ] Replace priority queue toposort with...
Currently only expands trigger the pad path. We should generalize to all pads. - [x] Simple pads (tests: #4570, scheduler change: #4614) - [ ] Complex pads where mask isn't...
If the first expand has no other child it shouldn't realize. Otherwise can make it a multi output kernel if it's not a reduceop. Milestones: - [ ] Generalize the...
This diff could to fix the ImageDType regression in openpilot and conv2 backward double reduce.