toolchain ARCEB: linux kernel build error: operand out of range (-132 is not between -128 and 127)

I've reproduced the Linux kernel build issue with arceb compiler, described here: http://lists.infradead.org/pipermail/linux-snps-arc/2023-August/007525.html

To reproduce this issue, I used gcc 12.2 from the arc-2023.03 toolchain release (Linux/glibc ARC HS Big Endian).

With certain set of options, I observe that compiler puts short branch instruction (bne_s) with incorrect relative offset and then assembler prints the following error:

refscale.s: Assembler messages:
refscale.s:1917: Error: operand out of range (-132 is not between -128 and 127)

The refscale.zip achive contains pre-processed source (refscale.i). The issue can be reproduced with this file by the following command:

arceb-linux-gcc -save-temps -mlock -mno-ll64 -mmedium-calls -mcpu=hs38 -fno-inline-functions-called-once -fconserve-stack -Os -mlong-calls -c ./refscale.i -o refscale.o

The little-endian compiler with -mbig-endian also shows this issue.

Aug 17 '23 10:08 pavelvkozlov

A reduced set would be:

int  m(void);
int  n(long long, int);
void p(int, int, int, int);
void o(void);
void q(const char *);
void r(char *);

static int a, b = 30, c, e, g;
int d = (int) &c;
long long *h;
long long k();

int l() {
  long long j;
  int f;
  char i;
  f = 0;
  for (; f < b && m(); f++) {
    g = 0;
    for (; a;)
      for (;;)
        ;
    j = k();
    h[f] = n(1000 * j, a * 10000);
  }
  f = 0;
  for (; f < b; f++)
    o();
  if (c)
    p(e, 0, 1, 0);
  q("");
  r(&i);
}

s()
{
    a = b = 0;
}

and build with:

$ arc-elf32-gcc -mbig-endian                      \
                -mno-ll64                         \
                -mcpu=hs38                        \
                -fno-inline-functions-called-once \
                -fconserve-stack                  \
                -Os                               \
                -mlong-calls                      \
                -c                                \
                ./test.c                          \
                -o /dev/null

Aug 21 '23 13:08 shahab-vahedi

The relevant assembly output:

.L8:
        mov_s   r13,0
        .align  2
.L6:
        ld_s    r0,[r14]
        brgt    r0, r13, @.L9
        mov_s   r0,@.LANCHOR0
        ld_s    r0,[r0,4]
        breq_s  r0, 0, @.L10
        mov_s   r3,0
        mov_s   r2,1
        mov_s   r1,0
        mov_s   r0,0
        jl      @p
        .align  2
.L10:
        mov_s   r0,@.LC0
        jl      @q
        add     r0,sp,3
        jl      @r
        add_s   sp,sp,4
        leave_s {r13-r14, blink, pcl}
        .align  2
.L3:
        jl      @k
        mpy     r3,r0,1000
        ld_s    r2,[r14]
        mpydu   r0,r1,1000
        add_s   r0,r3,r0
        mpy     r2,r2,10000
        jl      @n
        ld_s    r1,[gp,@h@sda]
        add3_s  r1,r1,r13
        add_s   r13,r13,1
        st_s    r0,[r1,4]
        asr_s   r0,r0,31
        st_s    r0,[r1]
        b_s     @.L2
        .align  2
.L5:
        jl      @m
        breq_s  r0, 0, @.L8     <----- The problematic range

Aug 30 '23 05:08 shahab-vahedi

The relevant diff between the assembly output of little endian and big endian

  .-----------------------.-----------------------.
  |     little endian     |       big endian      |
  |-----------------------+-----------------------|
  |  mpy     r3,r1,1000   |  mpy     r3,r0,1000   |
  |  ld_s    r2,[r14]     |  ld_s    r2,[r14]     |
  |  mpydu   r0,r0,1000   |  mpydu   r0,r1,1000   |
  |  add_s   r1,r3,r1     |  add_s   r0,r3,r0     |
  |  mpy     r2,r2,10000  |  mpy     r2,r2,10000  |
  |  jl @n                |  jl @n                |
  `-----------------------^-----------------------'

The problem is the mpydu instruction. In little endian form, it's mpydu b,b,s12 which is 4 bytes long. However, in big endian output, it's mpydu a,b,limm which is 8 bytes long. Fixing the length costs in GCC should fix this issue.

Thanks @claziss for pointing me in the right direction.

Aug 30 '23 06:08 shahab-vahedi

gcc indeed considers the length of mpydu r0, r1, ... instruction 4 bytes long:

$ arc-elf32-gcc ... -mbig-endian ... -dp ...
  ...
  mpydu   r0,r1,1000  # 29    [c=4 l=4]  mpydu_imm_arcv2hs/1
  ...

Variant 1 of mpydu_imm_arv2hs corresponds to r, 0, I constraint that has a length of 4.

$ cat /src/gcc/gcc/config/arc/arc.md
  ...
  (define_insn "mpyd<su_optab>_imm_arcv2hs"
  [(set (match_operand:DI 0 "even_register_operand"             "=r,r,  r")
        (mult:DI (SEZ:DI (match_operand:SI 1 "register_operand"  "r,0,  r"))
                 (match_operand 2            "immediate_operand" "L,I,Cal")))
  ...
  [(set_attr "length" "4,4,8")

At the first glance, the real problem here is that mpydu r0, r1, 1000 shouldn't have been mapped to r, 0, I, but r, r, Cal instead. The tricky part is that for the big endian, gcc considers this form of assignment:

(r0)r1 = mpydu(r1, 1000)

So, in its eyes, source and destination registers are the same (r1). This could be fine if ARC's ISA would allow r1 to be encoded as the indicator for destination register pair in big endian (r0r1). However, the ISA still expects the same indicator for the little endian register pair (r1r0), which is r0.

Aug 30 '23 08:08 shahab-vahedi

Proposed fix here: https://github.com/foss-for-synopsys-dwc-arc-processors/gcc/commit/b974ff374d51e03d99271d6adde8eb39490a0185

Aug 31 '23 05:08 claziss

Proposed fix here: foss-for-synopsys-dwc-arc-processors/gcc@b974ff3

The proposed fix disables the b,b,... format when dealing with a big endian target for the following instructions:

macd(u)
mpyd{s,u}_arcv2hs

vmac2h(u)
vmpy2h(u)

Aug 31 '23 11:08 shahab-vahedi

toolchain toolchain copied to clipboard

ARCEB: linux kernel build error: operand out of range (-132 is not between -128 and 127)

toolchain
toolchain copied to clipboard