qazal
qazal
@0xtimmy also, does this diff allow for softmax fusion? I didn't find it in test_linearizer
what is the blocker to make this use virtual edges instead of another special toposort? we wanna make the `graph: DefaultDict[UOp, List[UOp]]` such that any toposort is valid.
converting to draft to try extending `ASSERT_COMPILE=1` (possibly with a wrapper).
I've tried a couple of options over the past two weeks and ASSERT_COMPILE=1 is the most productive and reliable way to do these tests. We can make progress on benchmarks...
`PROFILE=0 python3 test/external/external_benchmark_schedule.py` master ``` ***** model forward in 19.03 ms ***** model schedule in 6.28 ms ***** model lower in 990.70 ms ``` upat: ``` ***** model forward in...
remu needs new instructions - I think I pretty much implemented all formats so should be easy to add them.
I've seen line 48: https://github.com/Qazalin/remu/blob/master/src/lib.rs be very slow for large global sizes. Tried multithreading but the overhead cost was greater.
so I'd first root cause where the bottleneck is. The global dims is a guess.
@nimlgen done, I'll test and release if it's faster. https://github.com/Qazalin/remu/commit/7789014197ca2d2291037c59ba4e8d9f77550774
I released this FYI - didn't see a huge perf change though