cortex-m icon indicating copy to clipboard operation
cortex-m copied to clipboard

asm::delay() only blocks half the time intended on an STM32H7

Open nkrackow opened this issue 2 years ago • 6 comments

The asm::delay() function delays a program only by half the time/CPU cycles intended on an STM32H7. I have prepared a somewhat minimal example here. My CPU clock is at 400 MHz and I specify a delay of 4_000_000_000 cycles which should lead to a 10 second delay. The actual delay is 5 seconds for me.

I have also confirmed the same behavior in other programs on an STM32H7 with much shorter timescales and looking at IOs on a scope. The delay is always exactly half the time expected.

One guess is that this is due to the dual issue pipeline of the H7 eating up two instructions per cycle.

nkrackow avatar Apr 13 '22 11:04 nkrackow

Oh I forgot to mention that I tried this with 0.7.4 and the current master. In my example repo I patch in the current master.

nkrackow avatar Apr 13 '22 13:04 nkrackow

Hm, that's annoying, we deliberately double the number of cycles delay() blocks for in #312 to accommodate the dual issue pipeline of the Cortex-M7 (#236); in fact this led to complaints about delaying too long on Cortex-M3/4 in #325.

I vaguely recall checking this on an stm32h7 and getting the right result, I'll see if I can dig it out and investigate a bit further. I guess we could read CPUID at runtime to try and work out what CPU we're on, but even then things like whether execution is from flash or RAM or XIP (Q)SPI flash, whether i-cache is on, whether a flash accelerator is configured, etc etc etc will all heavily influence the number of cycles. The current calculation is a two-instruction loop and it runs half the specified number of cycles of iterations, i.e. it will retire as many instructions as requested delay.

It might be that the best fix we can do is make the documentation even more vague on precisely how many cycles of delay you'll get...

adamgreig avatar Apr 14 '22 11:04 adamgreig

Sorry.. ;) I mean please do double check my example or try with other code. I tired to exclude other variables where possible and had some discussion with @jordens but there might still be another explanation/condition causing this for me.

But since this actually broke my code which had to wait for an external device reset I think it's an important bug.

nkrackow avatar Apr 14 '22 18:04 nkrackow

I suspect the only practical solution might indeed be to not touch the scaling factor anymore for behavioral stability/compatibility and make the docs say that this is an uncalibrated delay with both an an offset and a scaling factor that depend on cortex-m architecture and e.g. enabled cpu/code location/cache features.

jordens avatar Apr 14 '22 19:04 jordens

I double checked this in https://github.com/rust-embedded/cortex-m/issues/325#issuecomment-1421809232 and indeed we currently only delay half the required time on CM7; the fix in #312 doesn't fully account for the core being able to issue both instructions in the loop at the same time.

I guess ideally we'd experiment a bit more about when a CM7 core will or won't dual-issue the loop, but if it seems like it does always manage it, I'd be inclined to try doing feature detection and delaying for about the right number of cycles for the current core, rather than being wrong on everything (as currently) or 3x worse on one core than another (if we fixed it for cm7).

adamgreig avatar Feb 08 '23 01:02 adamgreig

I'd definitely prefer guaranteeing an error on the long delay side rather than trying "about right". For the frequent applications of e.g. guaranteeing datasheet timing a relative error of 3 is acceptable (if annoying) while 0.9 would be unacceptable.

jordens avatar Feb 09 '23 16:02 jordens