Halide
Halide copied to clipboard
[x86] Codegen phaddw, phaddd, and pmaddwd
This PR adds support for int16 -> int32
horizontal widening adds to use pmaddwd
, and pattern matches on horizontal adds to use phadd(w | d)
, which is faster than the permute + padd that we currently generate for such reductions (cite @abadams + llvm-mca because my x86 machine is currently broken).
Also fly-by move of some code that I incorrectly placed in the wrong runtime file in #6677 .
The vector and scalar versions of op_phaddw_285 disagree. Maximum error: 60123
Failures were due to a shuffling mistake, I forgot that AVX2 has the 128 bit boundaries for instructions - fixed in 0acae70, and confirmed that we still produce better codegen for reductions.
There's a trade-off between adding new llvm ir implementations of things, and rewriting Halide Exprs into forms that would match existing patterns. E.g. for the widening add, you could rewrite the Expr so that it matches should_use_dot_product.
My question is: Did you consider that, and does this PR fall on the right side of that line?
We don't have any existing patterns for phadd
instructions, so I'll only respond to your question in regards to the use of pmaddwd
here.
I did not consider rewriting within Halide IR, but your suggestion raises a good point - I realize that I did not provide LLVM implementations for the AVX512 dot_product instructions, which could be similarly used. I suspect that your suggest would benefit from greater generality, and would be better suited for extensibility, so I will change this PR to transform all of the horizontal_widening_add
s into the pattern that will match against the existing dot_product patterns
I will say that I don't think these particular patterns are suitable for rewriting to enable should_use_dot_product
, but instead I will aim for producing the patterns that will match the VectorReduce dot_product instructions.
buildbots are green -- is this ready to land?
I might close this PR in favor of #6884 , which will make the horizontal_widening_add
patterns useful on any architecture that has pmaddwd
variants. Just waiting to get feedback on whether that PR is indeed a better way of doing things.
Monday Morning Review Ping -- where does this PR stand? Is it still likely to be dropped in favor of #6884?
Yes, I am closing this in favor of #6884 . If we end up not merging #6884 for some reason, I will re-open this PR.