llvm-project [AArch64] Generate sqdmlal

trafficstars

Bugzilla Link 50653

Version trunk

OS Linux

CC @Arnaud-de-Grandmaison-ARM,@DMG862,@smithp35


Bugzilla Link	50653
Version	trunk
OS	Linux
CC	@Arnaud-de-Grandmaison-ARM,@DMG862,@smithp35

Extended Description

Raising this missed optimisation opportunity in case someone finds this interesting.

For this input:

#include "arm_neon.h" int32_t t_vqdmlalh_lane_s16 (int32_t a, int16_t b, int16x4_t c) { return vqdmlalh_lane_s16 (a, b, c, 0); }

We are not generating this multiply-accumulate variant that gcc generates:

t_vqdmlalh_lane_s16: dup v2.4h, w1 fmov s1, w0 sqdmlal s1, h2, v0.h[0] fmov w0, s1 ret

We get this instead:

t_vqdmlalh_lane_s16: // @t_vqdmlalh_lane_s16 fmov s1, w1 sqdmull v0.4s, v1.4h, v0.4h fmov s1, w0 sqadd s0, s1, s0 fmov w0, s0 ret

See also https://godbolt.org/z/41nMxM5q1

Jun 10 '21 09:06 sjoerdmeijer

@llvm/issue-subscribers-good-first-issue

Mar 30 '22 23:03 llvmbot

I am looking into this.

There are equivalent missed optimization opportunities with the following intrinsics too:

vqdmlalh_s16
vqdmlslh_s16
vqdmlalh_laneq_s16
vqdmlslh_lane_s16
vqdmlslh_laneq_s16

Aug 04 '22 13:08 overmighty

I found a fix for this issue.

I also found that sqdmlal/sqdmlsl instructions are in fact generated from the following intrinsics, as long as the lane number passed is not 0:

vqdmlalh_lane_s16
vqdmlalh_laneq_s16
vqdmlslh_lane_s16
vqdmlslh_laneq_s16

For example, for this C code:

int32_t u_vqdmlalh_lane_s16(int32_t a, int16_t b, int16x4_t v) {
    return vqdmlalh_lane_s16(a, b, v, 1);
}

Clang generates this AArch64 assembly code:

u_vqdmlalh_lane_s16:                    // @u_vqdmlalh_lane_s16
        fmov    s1, w1
        fmov    s2, w0
        sqdmlal v2.4s, v1.4h, v0.h[1]
        fmov    w0, s2
        ret

See https://godbolt.org/z/fYM6G1TcM for all my experiments.

The different kind of DAGs generated when the lane number is not 0 is matched by this TableGen definition:

https://github.com/llvm/llvm-project/blob/f8d976171f2a1b7bf9268929f77904973edb0378/llvm/lib/Target/AArch64/AArch64InstrFormats.td#L8850-L8862

Aug 07 '22 19:08 overmighty

IICU there's a candidate patch: https://reviews.llvm.org/D131700

Aug 14 '22 16:08 fhahn

llvm-project llvm-project copied to clipboard

[AArch64] Generate sqdmlal

Extended Description

llvm-project
llvm-project copied to clipboard