ArduinoCore-samd icon indicating copy to clipboard operation
ArduinoCore-samd copied to clipboard

delayMicroseconds in complicated code causes compilation to fail

Open EmbeddedG opened this issue 3 years ago • 5 comments

Hello!

Compilation fails with the error message:

maxtron_translator.elf.ltrans0.s: Assembler messages:
maxtron_translator.elf.ltrans0.s:14826: Error: lo register required -- `sub r8,#1'

In this code:

  __asm__ __volatile__(
    "1:              \n"
    "   sub %0, #1   \n" // substract 1 from %0 (n)
    "   bne 1b       \n" // if result is not 0 jump to 1
    : "+r" (n)           // '%0' is n variable with RW constraints

%0 gets assigned to a high register (r8-r15) in my case. I suspect the reason is that gcc embeds this assembly in a complicated code using many registers (which is caused by LTO combining large parts of the code into single "functions"). Cortex-M0 doesn't support the sub/subs instruction with every register/immediate combination.

Modifying the register constraints so that only a lower register is suitable for %0 solves the problem for me:

  __asm__ __volatile__(
    "1:              \n"
    "   sub %0, #1   \n" // substract 1 from %0 (n)
    "   bne 1b       \n" // if result is not 0 jump to 1
    : "+l" (n)           // '%0' is n variable with RW constraints (only Lo register is allowed)

SAMD core version: 1.8.9 GCC version: 9.3.1 (I need some C++17 features)

EmbeddedG avatar Mar 31 '21 16:03 EmbeddedG

With the unsigned wait it is possible to wait for about 72 minutes. But I don't think it 's used for such long times. So a possible fix would be to use a signed value (still able to delay for 36 minutes), negate it in the function and use add instead of sub. Ok, as I am not an ARM assembly expert, I just assume that add also sets the zero flag

Reneg973 avatar Apr 08 '21 21:04 Reneg973

@EmbeddedG: You could try it? Could you also check whether the cortex-m0 flag is given to the compiler (command line)? If yes, it's a compiler issue.

Reneg973 avatar Apr 09 '21 06:04 Reneg973

With the unsigned wait it is possible to wait for about 72 minutes. But I don't think it 's used for such long times. So a possible fix would be to use a signed value (still able to delay for 36 minutes), negate it in the function and use add instead of sub. Ok, as I am not an ARM assembly expert, I just assume that add also sets the zero flag

Why do you think it is a better idea to change the function signature in the core library than changing a little detail in the function implementation? I can be convinced, if you have a good reason :)

I don't think it's a good idea to change the function signature, that could break code using it. As far as I know, delayMicroseconds accepts unsigned int parameter for every core, so I wouldn't change it only for SAMD. My fix only modifies the "+r" constraint to "+l" in the function body, no change is seen from outside.

I checked the ARM architecture reference, the Cortex M0 manuals and several forums about GCC, unified/Thumb2 assembly and it is still not clear for me whether I could get this to work any other way. I like ARM assembly and sometimes I even write assembly routines, but I got a little lost here. Some other people also stated that the proper handling of sub with Cortex-M0 and unified assembly is not clear from GCC and ARM documentation. But forcing the compiler to use a Lo register always gets this working. This may be a real bug and it may have been there since forever, but it only causes problems if the code uses many registers - and I think that very complicated code is not that common around delayMicroseconds in Arduino. So I can imagine, that this bug occurs very rarely, and with a mysterious error message saying nothing about the location of the original error (delayMicrosends is always_inline) - and I could only pinpoint that the issue was in delayMicroseconds, because I tried and could investigate the intermediate assembly code produced by GCC.

EmbeddedG avatar Apr 09 '21 17:04 EmbeddedG

excuse me, didn't check this completely. Why don't you make a pull request for this? I think it's a good idea to just use general purpose registers in case of 'sub'. Anyway it must also be a bug in the compiler. Or maybe the "+r" tricks it.

Reneg973 avatar Apr 09 '21 19:04 Reneg973

No, it's not a compiler bug. ARMv6-M is restricted to a subset of Thumb-2 opcodes, and in particular SUB-immediate only takes low registers. See ARMv6-M Architecture Reference Manual section A6.7.65 "SUB (immediate)". As of writing, rev C is on the web but rev E is only available as a PDF (DDI0419E).

So @EmbeddedG's solution is correct: the asm constraint should have been "l+" all along. (See GCC's docs on machine-specific asm constraints, scroll down to "ARM family".)

Related: adafruit/Adafruit_BusIO#102

I'm not sure if anyone will be hitting this issue again since 0a55883 no longer inlines delayMicroseconds() (unlike in Adafruit's fork). It's still technically wrong to specify "r+", though, since the compiler makes no guarantees here--however unlikely it is that GCC will start allocating high registers in a small leaf function.

gsamudra avatar Nov 05 '23 12:11 gsamudra