Alexander Root

Results 63 comments of Alexander Root

@steven-johnson I have a suspicion that it could be the result of an issue with the top-down type inference in IRMatch.h. Any chance you could try out the small change...

LLVM 11 on linux and LLVM 10 on mac os, both experience the slowdown

Clean checkouts on linux machine: 3e034d6: ``` Manually-tuned time: 0.52348ms ``` 813eadc: ``` Manually-tuned time: 0.360595ms ``` Can confirm clean checkouts on the mac later if necessary

Hmm, it's not nearly as bad on the mac. 3e034d6: ``` Manually-tuned time: 0.497357ms ``` 813eadc: ``` Manually-tuned time: 0.426782ms ```

Ah sorry about that. Mac has 6, Linux has 24.

The main difference I'm finding in the generated assembly is the use of `halide_mutex_(unlock + yield + lock)` versus `halide_cond_wait`. Per @abadams , I set `max_spin_count=0` in src/runtime/thread_pool_common.h and am...

A paper that built on TACO (and I believe appears in MLSys this year) has support for this type of convolution, the github repo is here: https://github.com/nullplay/Unified-Convolution-Framework

This seems to be fixed - do we need to keep the issue open?

> Adding @rootjalex as reviewer specifically for XtensaOptimize, due to its similarity to HexagonOptimize I am happy to take a look but am traveling this week, might be a few...