qazal

Results 187 comments of qazal

@0xtimmy also, does this diff allow for softmax fusion? I didn't find it in test_linearizer

what is the blocker to make this use virtual edges instead of another special toposort? we wanna make the `graph: DefaultDict[UOp, List[UOp]]` such that any toposort is valid.

converting to draft to try extending `ASSERT_COMPILE=1` (possibly with a wrapper).

I've tried a couple of options over the past two weeks and ASSERT_COMPILE=1 is the most productive and reliable way to do these tests. We can make progress on benchmarks...

`PROFILE=0 python3 test/external/external_benchmark_schedule.py` master ``` ***** model forward in 19.03 ms ***** model schedule in 6.28 ms ***** model lower in 990.70 ms ``` upat: ``` ***** model forward in...

remu needs new instructions - I think I pretty much implemented all formats so should be easy to add them.

I've seen line 48: https://github.com/Qazalin/remu/blob/master/src/lib.rs be very slow for large global sizes. Tried multithreading but the overhead cost was greater.

so I'd first root cause where the bottleneck is. The global dims is a guess.

@nimlgen done, I'll test and release if it's faster. https://github.com/Qazalin/remu/commit/7789014197ca2d2291037c59ba4e8d9f77550774

I released this FYI - didn't see a huge perf change though