Alexander Root
Alexander Root
Monday morning review ping
@steven-johnson I have a suspicion that it could be the result of an issue with the top-down type inference in IRMatch.h. Any chance you could try out the small change...
LLVM 11 on linux and LLVM 10 on mac os, both experience the slowdown
Clean checkouts on linux machine: 3e034d6: ``` Manually-tuned time: 0.52348ms ``` 813eadc: ``` Manually-tuned time: 0.360595ms ``` Can confirm clean checkouts on the mac later if necessary
Hmm, it's not nearly as bad on the mac. 3e034d6: ``` Manually-tuned time: 0.497357ms ``` 813eadc: ``` Manually-tuned time: 0.426782ms ```
Ah sorry about that. Mac has 6, Linux has 24.
The main difference I'm finding in the generated assembly is the use of `halide_mutex_(unlock + yield + lock)` versus `halide_cond_wait`. Per @abadams , I set `max_spin_count=0` in src/runtime/thread_pool_common.h and am...
A paper that built on TACO (and I believe appears in MLSys this year) has support for this type of convolution, the github repo is here: https://github.com/nullplay/Unified-Convolution-Framework
This seems to be fixed - do we need to keep the issue open?
> Adding @rootjalex as reviewer specifically for XtensaOptimize, due to its similarity to HexagonOptimize I am happy to take a look but am traveling this week, might be a few...