ArduinoCore-samd
ArduinoCore-samd copied to clipboard
delayMicroseconds in complicated code causes compilation to fail
Hello!
Compilation fails with the error message:
maxtron_translator.elf.ltrans0.s: Assembler messages:
maxtron_translator.elf.ltrans0.s:14826: Error: lo register required -- `sub r8,#1'
In this code:
__asm__ __volatile__(
"1: \n"
" sub %0, #1 \n" // substract 1 from %0 (n)
" bne 1b \n" // if result is not 0 jump to 1
: "+r" (n) // '%0' is n variable with RW constraints
%0 gets assigned to a high register (r8-r15) in my case. I suspect the reason is that gcc embeds this assembly in a complicated code using many registers (which is caused by LTO combining large parts of the code into single "functions"). Cortex-M0 doesn't support the sub/subs instruction with every register/immediate combination.
Modifying the register constraints so that only a lower register is suitable for %0 solves the problem for me:
__asm__ __volatile__(
"1: \n"
" sub %0, #1 \n" // substract 1 from %0 (n)
" bne 1b \n" // if result is not 0 jump to 1
: "+l" (n) // '%0' is n variable with RW constraints (only Lo register is allowed)
SAMD core version: 1.8.9 GCC version: 9.3.1 (I need some C++17 features)
With the unsigned wait it is possible to wait for about 72 minutes. But I don't think it 's used for such long times. So a possible fix would be to use a signed value (still able to delay for 36 minutes), negate it in the function and use add instead of sub. Ok, as I am not an ARM assembly expert, I just assume that add also sets the zero flag
@EmbeddedG: You could try it? Could you also check whether the cortex-m0 flag is given to the compiler (command line)? If yes, it's a compiler issue.
With the unsigned wait it is possible to wait for about 72 minutes. But I don't think it 's used for such long times. So a possible fix would be to use a signed value (still able to delay for 36 minutes), negate it in the function and use add instead of sub. Ok, as I am not an ARM assembly expert, I just assume that add also sets the zero flag
Why do you think it is a better idea to change the function signature in the core library than changing a little detail in the function implementation? I can be convinced, if you have a good reason :)
I don't think it's a good idea to change the function signature, that could break code using it. As far as I know, delayMicroseconds
accepts unsigned int
parameter for every core, so I wouldn't change it only for SAMD. My fix only modifies the "+r" constraint to "+l" in the function body, no change is seen from outside.
I checked the ARM architecture reference, the Cortex M0 manuals and several forums about GCC, unified/Thumb2 assembly and it is still not clear for me whether I could get this to work any other way. I like ARM assembly and sometimes I even write assembly routines, but I got a little lost here. Some other people also stated that the proper handling of sub with Cortex-M0 and unified assembly is not clear from GCC and ARM documentation. But forcing the compiler to use a Lo register always gets this working.
This may be a real bug and it may have been there since forever, but it only causes problems if the code uses many registers - and I think that very complicated code is not that common around delayMicroseconds
in Arduino. So I can imagine, that this bug occurs very rarely, and with a mysterious error message saying nothing about the location of the original error (delayMicrosends
is always_inline) - and I could only pinpoint that the issue was in delayMicroseconds
, because I tried and could investigate the intermediate assembly code produced by GCC.
excuse me, didn't check this completely. Why don't you make a pull request for this? I think it's a good idea to just use general purpose registers in case of 'sub'. Anyway it must also be a bug in the compiler. Or maybe the "+r" tricks it.
No, it's not a compiler bug. ARMv6-M is restricted to a subset of Thumb-2 opcodes, and in particular SUB-immediate only takes low registers. See ARMv6-M Architecture Reference Manual section A6.7.65 "SUB (immediate)". As of writing, rev C is on the web but rev E is only available as a PDF (DDI0419E).
So @EmbeddedG's solution is correct: the asm
constraint should have been "l+"
all along. (See GCC's docs on machine-specific asm
constraints, scroll down to "ARM family".)
Related: adafruit/Adafruit_BusIO#102
I'm not sure if anyone will be hitting this issue again since 0a55883 no longer inlines delayMicroseconds()
(unlike in Adafruit's fork). It's still technically wrong to specify "r+"
, though, since the compiler makes no guarantees here--however unlikely it is that GCC will start allocating high registers in a small leaf function.