llvm-project icon indicating copy to clipboard operation
llvm-project copied to clipboard

[AArch64] Generate sqdmlal

Open sjoerdmeijer opened this issue 4 years ago • 4 comments
trafficstars

Bugzilla Link 50653
Version trunk
OS Linux
CC @Arnaud-de-Grandmaison-ARM,@DMG862,@smithp35

Extended Description

Raising this missed optimisation opportunity in case someone finds this interesting.

For this input:

#include "arm_neon.h" int32_t t_vqdmlalh_lane_s16 (int32_t a, int16_t b, int16x4_t c) { return vqdmlalh_lane_s16 (a, b, c, 0); }

We are not generating this multiply-accumulate variant that gcc generates:

t_vqdmlalh_lane_s16: dup v2.4h, w1 fmov s1, w0 sqdmlal s1, h2, v0.h[0] fmov w0, s1 ret

We get this instead:

t_vqdmlalh_lane_s16: // @​t_vqdmlalh_lane_s16 fmov s1, w1 sqdmull v0.4s, v1.4h, v0.4h fmov s1, w0 sqadd s0, s1, s0 fmov w0, s0 ret

See also https://godbolt.org/z/41nMxM5q1

sjoerdmeijer avatar Jun 10 '21 09:06 sjoerdmeijer

@llvm/issue-subscribers-good-first-issue

llvmbot avatar Mar 30 '22 23:03 llvmbot

I am looking into this.

There are equivalent missed optimization opportunities with the following intrinsics too:

  • vqdmlalh_s16
  • vqdmlslh_s16
  • vqdmlalh_laneq_s16
  • vqdmlslh_lane_s16
  • vqdmlslh_laneq_s16

overmighty avatar Aug 04 '22 13:08 overmighty

I found a fix for this issue.

I also found that sqdmlal/sqdmlsl instructions are in fact generated from the following intrinsics, as long as the lane number passed is not 0:

  • vqdmlalh_lane_s16
  • vqdmlalh_laneq_s16
  • vqdmlslh_lane_s16
  • vqdmlslh_laneq_s16

For example, for this C code:

int32_t u_vqdmlalh_lane_s16(int32_t a, int16_t b, int16x4_t v) {
    return vqdmlalh_lane_s16(a, b, v, 1);
}

Clang generates this AArch64 assembly code:

u_vqdmlalh_lane_s16:                    // @u_vqdmlalh_lane_s16
        fmov    s1, w1
        fmov    s2, w0
        sqdmlal v2.4s, v1.4h, v0.h[1]
        fmov    w0, s2
        ret

See https://godbolt.org/z/fYM6G1TcM for all my experiments.

The different kind of DAGs generated when the lane number is not 0 is matched by this TableGen definition:

https://github.com/llvm/llvm-project/blob/f8d976171f2a1b7bf9268929f77904973edb0378/llvm/lib/Target/AArch64/AArch64InstrFormats.td#L8850-L8862

overmighty avatar Aug 07 '22 19:08 overmighty

IICU there's a candidate patch: https://reviews.llvm.org/D131700

fhahn avatar Aug 14 '22 16:08 fhahn