tinygrad
tinygrad copied to clipboard

Published 20 hours ago •

Reame
Issues

Make efficientnet trainer fast

Open geohot opened this issue 2 years ago • 0 comments

[x] Support asymmetric padding
[ ] Support caching of only unresolved LazyBuffers See CACHE_LAZYBUFFERS
[ ] Support not running the convs multiple times, aka late split. This fixes an openpilot runner bug too

# <enum 'ProcessingOps'> 70
GRAPH=1 CNT=1 GPU=1 PYTHONPATH="." python3 examples/benchmark_train_efficientnet.py
# <enum 'ProcessingOps'> 208
GRAPH=1 CNT=1 LAZY=1 PYTHONPATH="." python3 examples/benchmark_train_efficientnet.py

[x] Support counting FLOPS and work out theoretical max
[x] "Upstream" LAZY and make it default
[ ] Remove padding from the ml conv op and remove the slice on execution?
[ ] Collapse reduce into binary op
[ ] Improve the "JIT" to not constant fold things like LR.

Jun 25 '22 06:06 geohot