Alexander Root comments

Results 63 comments of


                                            Alexander Root

[x86] Separate vector instruction selection and CodeGen passes

Monday morning review ping

[x86] Separate vector instruction selection and CodeGen passes

@steven-johnson I have a suspicion that it could be the result of an issue with the top-down type inference in IRMatch.h. Any chance you could try out the small change...

Performance regression on depthwise_separable_conv

LLVM 11 on linux and LLVM 10 on mac os, both experience the slowdown

Performance regression on depthwise_separable_conv

Clean checkouts on linux machine: 3e034d6: ``` Manually-tuned time: 0.52348ms ``` 813eadc: ``` Manually-tuned time: 0.360595ms ``` Can confirm clean checkouts on the mac later if necessary

Performance regression on depthwise_separable_conv

Hmm, it's not nearly as bad on the mac. 3e034d6: ``` Manually-tuned time: 0.497357ms ``` 813eadc: ``` Manually-tuned time: 0.426782ms ```

Performance regression on depthwise_separable_conv

Ah sorry about that. Mac has 6, Linux has 24.

Performance regression on depthwise_separable_conv

The main difference I'm finding in the generated assembly is the use of `halide_mutex_(unlock + yield + lock)` versus `halide_cond_wait`. Per @abadams , I set `max_spin_count=0` in src/runtime/thread_pool_common.h and am...

convolution

A paper that built on TACO (and I believe appears in MLSys this year) has support for this type of convolution, the github repo is here: https://github.com/nullplay/Unified-Convolution-Framework

HVX codegen error: "Cannot scavenge register without an emergency spill slot"

This seems to be fixed - do we need to keep the issue open?

Backport xtensa_codegen to main (v2)

> Adding @rootjalex as reviewer specifically for XtensaOptimize, due to its similarity to HexagonOptimize I am happy to take a look but am traveling this week, might be a few...