RIOT icon indicating copy to clipboard operation
RIOT copied to clipboard

periph: GPIO drivers are not thread safe

Open haukepetersen opened this issue 9 years ago • 30 comments

From #4830: The implementation of almost all our GPIO drivers is not thread safe. Though we have not seen the effect of this somewhere, there are possibilities that pin configurations (or for some boards also set/clear operations) might be lost. Example are lines like these:

// from stm32f4 gpio.c -> gpio_init()
port->MODER |= (1 << (2 * pin_num));

This is compiled into a non atomic read-modify-store sequence. So if interrupted and the same register is modified by a different thread, this modification will be lost.

It would be good to find a global solution/best practice on how to mitigate this. My first quick thought would be to actually disable interrupts while writing this reg, as these sequences are very small and mutexes would be too much overhead.

NOTE: The same issue might also effect other peripheral driver than GPIO ones, but as most work on a device-level without (many) shared global registers, the chances are not quite as high.

haukepetersen avatar Feb 21 '16 13:02 haukepetersen

I would start by deciding whether it should be thread-safe. Right now the documentation does not say anything about this topic.

Also - if it is decided it should be thread-safe, please document the realtime side-effects of the implementation.

LudwigKnuepfer avatar Feb 21 '16 15:02 LudwigKnuepfer

For most ARM MCU's we could use bit-banding. This would make it thread safe without extra side effects.

DipSwitch avatar Feb 21 '16 15:02 DipSwitch

Good point. I am not quite sure though which ones to support bit banding, I think not all of the cortexes do...

@LudwigKnuepfer: Yes, they need to be thread safe, otherwise we can always end up with bad behavior as described above. (see also -> #4758). Real-time side effects are almost non existant, as the intervals for which we might have to disable interrupts are very (20 instructions max) short.

haukepetersen avatar Feb 21 '16 15:02 haukepetersen

All STM32 >= Cortex-M3 support Bit-Banding :) Freescale to I belief. On 21 Feb 2016 16:45, "Hauke Petersen" [email protected] wrote:

Good point. I am not quite sure though which ones to support bit banding, I think not all of the cortexes do...

@LudwigKnuepfer https://github.com/LudwigKnuepfer: Yes, they need to be thread safe, otherwise we can always end up with bad behavior as described above. (see also -> #4758 https://github.com/RIOT-OS/RIOT/issues/4758). Real-time side effects are almost non existant, as the intervals for which we might have to disable interrupts are very (20 instructions max) short.

— Reply to this email directly or view it on GitHub https://github.com/RIOT-OS/RIOT/issues/4866#issuecomment-186846646.

DipSwitch avatar Feb 21 '16 15:02 DipSwitch

Yes, they need to be thread safe, otherwise we can always end up with bad behavior as described above.

This is the invariant of operations which are not thread safe, not a reason why this particular interface should be thread safe. If an interface is known not to be thread safe, it can still be used in a thread-safe manner through external measurements. I am not saying the interface should or should not be thread-safe, just pointing this out. So again - there needs to be a decision that we actually want this first ;)

Real-time side effects are almost non existant ...

If they are present they may add up, so please document them unless there is good reason not to do so. I don't understand why you try to argue against this in the first place.

LudwigKnuepfer avatar Feb 21 '16 15:02 LudwigKnuepfer

see also -> #4758

The relevant part seems to be @haukepetersen writing there is a need for thread-safety of these APIs. I could not see any objections to this proposal in that place at the time of this writing.

LudwigKnuepfer avatar Feb 21 '16 15:02 LudwigKnuepfer

we could use bit-banding

That does uglify the code a lot...

kaspar030 avatar Feb 21 '16 19:02 kaspar030

Real-time side effects are almost non existant ...

If they are present they may add up, so please document them unless there is good reason not to do so. I don't understand why you try to argue against this in the first place.

Thing is, there's a lot of work in writing the actual code. Standing in row two and shouting, e.g., "document realtime-sideeffects!" will always be the right thing to do, but at the same time, always be annoying and taking wind out of the effort.

kaspar030 avatar Feb 21 '16 19:02 kaspar030

@kaspar030, so what's your message? People should keep their silence when they observe that changes may have an effect on real-time guarantees? I agree 100% with you that it is sometimes annoying that you want to do one contribution and are suddenly facing several other more or less related issues that slow down the process of your original task. However, I really appreciate people looking thoroughly at these implications and stating them instead of simply waving things through.

Disclaimer: this has little to do with the actual PR here.

OlegHahm avatar Feb 21 '16 23:02 OlegHahm

@kaspar030, so what's your message?

We have ~150 "disableIRQ()" in master, and none of them documents real-time effects. Suddenly insisting on documenting them in a random PR feels - annoying.

I was merely answering the question why @haukepetersen argued against documenting them.

kaspar030 avatar Feb 21 '16 23:02 kaspar030

"Things have always been bad, let's not start to improve them." That's your message then? Seriously?

OlegHahm avatar Feb 21 '16 23:02 OlegHahm

Whatever.

kaspar030 avatar Feb 21 '16 23:02 kaspar030

@LudwigKnuepfer: I think my answers came out the wrong way, sorry.

regarding real-time side-effects: I am not opposing to document them, my first feeling was rather in the direction that @kaspar030 stated. But being the first one to start documenting such thing is always a big overhead (how to document it, where to put it, etc...), and currently I just have not the time or nerves to be the one that starts this... But to not loose this topic, how about we open an issue of it's own to address this (for this case but also for all other occurrences of disableISR/restoreISR?!

regarding thread-safety: I seem to have remembered wrong, I thought there were already more discussion about this in #4758. So let me list my arguments here:

  • same as SPI and I2C, GPIO ports are shared ressources (e.g. pin A.01 is used by driver A, pin A.23 by driver B)
  • telling all drivers to take care themselves (external locking) to do only atomic access to these pins is not practicable
  • forcing all peaces of software that use GPIO to run in a single thread is not feasible
  • introducing a require/release (as for buses) is way to much overhead for a simple 'gpio_set/clear'

haukepetersen avatar Feb 22 '16 10:02 haukepetersen

For the set/clear/toggle it seems like most MCUs have special registers for only setting or clearing GPIOs (as opposed to "writing" a value to the pins) to work around this read-modify-write race, which means that it will not require any locking. I think the problem here is when two threads for different devices/applications try to reconfigure pins in the same port simultaneously since that can cause a read-modify-write cycle to interrupt in the middle and the interrupting write will be overwritten after returning control to the original thread.

Suggested plan:

  • Ensure that all GPIO drivers for platforms which support non-locking pin set/clear are actually using these registers instead of output |= (1 << pin);
  • Add either implicit locking (mutex inside the gpio functions) or disable IRQs for initialization and reconfiguration functions which require RMW access to hardware registers.

Cortex-M3/M4 bit banding can be used to avoid locking for single bit flips but the code becomes quite long if you need to modify more than one or two bits.

jnohlgard avatar Feb 22 '16 12:02 jnohlgard

Sure, we can split the realtime side-effects documentation to a separate issue. Maybe it's more purposeful to provide some tool for the task of identifying the possible side-effects anyways.

LudwigKnuepfer avatar Feb 22 '16 16:02 LudwigKnuepfer

@kaspar030 did create an issue already: https://github.com/RIOT-OS/RIOT/issues/4872

LudwigKnuepfer avatar Feb 22 '16 17:02 LudwigKnuepfer

won't be addressed for the current release -> postponed (though still highly relevant)

haukepetersen avatar Apr 08 '16 09:04 haukepetersen

Did you make a note somewhere to put into the known issues for the release notes?

OlegHahm avatar Apr 08 '16 09:04 OlegHahm

put it into the release note draft

haukepetersen avatar Apr 11 '16 11:04 haukepetersen

I'm afraid that we need to postpone this again, anyone knows if the new GPIO API takes into account this issue? Maybe @PeterKietzmann ?

kYc0o avatar Jul 11 '16 15:07 kYc0o

Not that I know. IMO the issue is not about the API design but more about (i) the driver implementation itself and/or (ii) documentation about the programming model

PeterKietzmann avatar Jul 12 '16 09:07 PeterKietzmann

In #5753 I modified the CC2538 gpio driver to use bit-banding. I agree with @DipSwitch that this method can be used to fix a lot of the Cortex-Ms. Furthermore, I think the BITBAND_ADDR() and BITBAND_VAR32() macros from cpu/k60 can be reused since the MCUs tend to alias regions 0x_0000000 to 0x_2000000 consistently.

hexluthor avatar Aug 16 '16 18:08 hexluthor

@haukepetersen, no news on this I suppose?

OlegHahm avatar Jan 15 '17 14:01 OlegHahm

This issue would affect gpio expanders, isn't it @ZetaR60 ?

kYc0o avatar Jul 20 '18 16:07 kYc0o

@kYc0o Thanks for the heads up on this. For the gpio_exp interface in #9190 gpio_toggle is affected because it is implemented as a read followed by a write to reduce the number of callbacks stored. For other gpio_exp functions, it depends on the implementation of the specific GPIO expander (#9054 is affected).

ZetaR60 avatar Jul 22 '18 20:07 ZetaR60

The issue with the extension API is fixed in #9860.

ZetaR60 avatar Aug 29 '18 19:08 ZetaR60

Should we close this one?

OlegHahm avatar Sep 17 '18 19:09 OlegHahm

Note: My statement above only pertains to the GPIO extension API (which is not yet merged), and not to GPIO drivers in general.

ZetaR60 avatar Sep 17 '18 19:09 ZetaR60

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If you want me to ignore this issue, please mark it with the "State: don't stale" label. Thank you for your contributions.

stale[bot] avatar Aug 10 '19 07:08 stale[bot]

GPIO LL is thread safe and clearly communicates the requirements. Once periph/gpio is replaced with an API compatible high level GPIO API (building upon GPIO LL and external GPIO extender drivers), the issue will be solved.

maribu avatar Sep 16 '22 12:09 maribu