tinygrad
tinygrad copied to clipboard
Make efficientnet trainer fast
- [x] Support asymmetric padding
- [ ] Support caching of only unresolved LazyBuffers
See CACHE_LAZYBUFFERS
- [ ] Support not running the convs multiple times, aka late split. This fixes an openpilot runner bug too
# <enum 'ProcessingOps'> 70
GRAPH=1 CNT=1 GPU=1 PYTHONPATH="." python3 examples/benchmark_train_efficientnet.py
# <enum 'ProcessingOps'> 208
GRAPH=1 CNT=1 LAZY=1 PYTHONPATH="." python3 examples/benchmark_train_efficientnet.py
- [x] Support counting FLOPS and work out theoretical max
- [x] "Upstream" LAZY and make it default
- [ ] Remove padding from the ml conv op and remove the slice on execution?
- [ ] Collapse reduce into binary op
- [ ] Improve the "JIT" to not constant fold things like LR.