abi-aa Incorrect overflow check in group relocations

5.7.8 Group Relocations in aaelf64.rst states:

"Generating the field for a Gn relocation directive starts by examining the residual value Yn after the bits of abs(X) corresponding to less significant fields have been masked off from X. If M is the mask specified in the table recording the relocation directive, Yn = abs(X) & ~((M & -M) - 1).

Overflow checking is performed on Yn unless the name of the relocation ends in "_NC"."

This is incorrect - the overflow check must be done on X, not on abs(X) since that creates an overflow bug. The overflow check does not need lower bits to be removed either. This paragraph can be removed since relocations already show what overflow checks to perform on X.

Mar 22 '21 15:03 Wilco1

I agree, this looks wrong. (A re-framing of the problem to help anyone looking at this later: imagine a signed _G0 relocation. if X is -2^16 then the NOT pattern of that can fit into the bit field these relocations act on. If X is 2^16 that pattern does not fit in the bitfield. All the tables account for the change in this absolute value limit between X < 0 and X >= 0. This paragraph does not.)

I also think that the paragraph after it can be confusing:

Finally the bit-field of X specified in the table (those bits of X picked out by 1-bits in M) is encoded into the instruction’s literal field as specified in the table. In some cases other instruction bits may need to be changed according to the sign of X.

This doesn't seem to account for the NOT that is applied to signed values (unless that's what is meant by "as specified in the table").

On top of this, the table describing GOT-relative inline group relocations don't mention the overflow checking limits.

I think I'd prefer to re-write the paragraphs rather than remove them. I have found summary descriptions giving the "general idea" of things useful in the past.

Mar 23 '21 13:03 mmalcomson

I agree a summary description that explains how to use these relocations with an example would be great. Key things that would be useful to point out:

The checking relocation must go first as this is always a MOVZ or MOVN, followed by non-checking MOVK relocations for the lower bits.
Signed checking MOVW relocations will change a MOVZ into a MOVN if X < 0 and invert the encoded bits (and these are the only bits changed by the linker)

Mar 23 '21 13:03 Wilco1

Haven't had too much time to go through the details. Just adding one thing that came up that would be good to address. LLVM's HWASAN uses one of the relocations https://reviews.llvm.org/D65364 in a way that isn't permitted by the ABI (currently), but looks sensible enough that it could be used.

  // MO_TAGGED on the page indicates a tagged address. Set the tag now.
  // We do so by creating a MOVK that sets bits 48-63 of the register to
  // (global address + 0x100000000 - PC) >> 48. This assumes that we're in
  // the small code model so we can assume a binary size of <= 4GB, which
  // makes the untagged PC relative offset positive. The binary must also be
  // loaded into address range [0, 2^48). Both of these properties need to
  // be ensured at runtime when using tagged addresses.

adrp x0, :pg_hi21_nc:global movk x0, #:prel_g3:global+4294967296 add x0, x0, :lo12:global

Mar 23 '21 13:03 smithp35

Using prel on an absolute address is weird and non-intuitive. It would be more reasonable to do it like:

adrp x0, :pg_hi21_nc: global
movk x0, :abs_g3: global
add  x0, x0, :lo12: global

The ADRP sets the right absolute address modulo 2^48, the MOVK then sets the top 16 bits.

Mar 23 '21 14:03 Wilco1

Using prel on an absolute address is weird and non-intuitive. It would be more reasonable to do it like:
adrp x0, :pg_hi21_nc: global
movk x0, :abs_g3: global
add  x0, x0, :lo12: global
The ADRP sets the right absolute address modulo 2^48, the MOVK then sets the top 16 bits.

Are you sure that is the same calculation? My understanding is that this stashing a HWASAN tag in the top 16 bits, derived from ((global address + 0x100000000 - PC) >> 48), which would be in the form (S + A) - P, a relative relocation.

I don't think :abs_g3: would evaluate to the same value as above.

Mar 23 '21 18:03 smithp35

I may be missing the point completely, but I think it's the same calculation once the runtime limits are enforced.

The equation (global address + 0x100000000 - PC) would have to be tagof(global address) + [something less than 8GB] due to the limit on the size of the binary. Hence, just taking the top 16 bits would just give the tag stored in the top byte of the global (no matter where the PC and symbol are is).

I believe the limit on the range 2^48 is there to ensure that the adrp does not need to store any bits in [63:48], ensuring that this calculation (which is guaranteed to set bits [55:48] to zero if the tag is in the top byte) doesn't override any bits relevant to the address and not the tag.

I agree that using prel on a part of an address that you want to copy across directly is a bit wierd. I guess that the prel_g3 relocation is used because the linker wouldn't accept an :abs_g3: on a symbol that is relocatable, and I would agree with that too. (I guess it's just a bit different that there's a use case has a symbol containing a load-time determined value but where the top two bytes which are compile-time determined). EDIT -- If that is the case -- I don't know if the tag is determined at compile time or load-time or runtime.

IIUC the modification of the ABI to make that use-case work is to avoid switching a movz to movn for the :prel_g3: relocation. I don't think this looks like much of a problem, but it seems like making a very special case for a special reason. (Not only having the g3 version behave differently for negative values than the g{2,1,0}, but also having the prel_g3 do it's group part slightly differently to the gotoff_g3).

I'd like to do a bit of double-checking a) that really no-one relies on the sign bit going into the less significant bits of the register, and b) if so, would there be any pushback against making the gotoff_g3 behave the same way.

Mar 24 '21 12:03 mmalcomson

Yes both relocation examples behave identically. You're right, the abs_g3 relocation is not allowed in shared objects, so using prel_g3 is the only option there.

If we change prel_g3, it should apply to all _g3 relocations, so gotoff_g3 as well. There are 2 options:

Change to MOV[KZ] relocation like MOVW_UABS_G3
Change to MOV[KNZ] relocation (backwards compatible with existing MOV[NZ] relocation)

I don't see how anyone could rely on MOVN being generated for prel_g3 unless you use the same trick on a sequence of prel_g3/g0/g1 (ie. skipping prel_g2 and relying on sign extension), but the 2nd option allows full backwards compatibility with existing prel_g3 behaviour.

Apr 16 '21 14:04 Wilco1