culvert icon indicating copy to clipboard operation
culvert copied to clipboard

Prevent Multiple Culvert Instances From Running

Open iwoloschin opened this issue 1 year ago • 2 comments

After a very silly mixup of which host I was on I accidentally ran two instances of culvert in parallel. One was flashing the firmware and the other triggered a WDT reset. This left my system in a pretty broken state! Fortunately this was in my lab so it was easily rectified, but it made me wonder if culvert could do something to prevent multiple instances from running in parallel?

I assume the right way to do this is with flock? I'd probably just wrap the entire program (somewhere in https://github.com/amboar/culvert/blob/main/src/culvert.c?) instead of specific subcommands, that would prevent my scenario from occurring again.

I'm open to attempting this myself, but wanted to see if anyone else had opinions or thinks this is a terrible idea before I look at writing any code!

iwoloschin avatar Dec 14 '23 15:12 iwoloschin

Okay, some immediate thoughts are there are at least three layers at which we should be locking. We need locking over:

  1. Bridge operations
  2. The fingerprinting operation,
  3. Use of the SoC.

For the latter on the AST2400 and AST2500 we can make use of SCU40[1], the "BMC Firmware Protection" bit ("Forbid SOCFlash to update flash (support from SOCFLASH v1.02.01"). On the AST2600 this moves to SCU100[1]. The differing locations require we perform the SoC locking after we've fingerprinted the address space and loaded the appropriate devicetree. This drives the need to lock while acquiring coherent fingerprint data, as that requires multiple transactions via the chosen bridge. But then we need to lock while driving the bridges, as they're often indirect and require multiple register accesses to complete a transaction, which will conflict in that case of multiple culvert processes.

I don't want to do a big lock up front around all of culvert as that requires unpicking all the assumptions later when we decide we need to get rid of it. We've seen this mistake before with the Big Kernel Lock, so let's dodge that trap from the start. Perhaps you could argue that the BMC Firmware Protection bit falls into this bucket too, but we don't have a more fine-grained coherent way of managing mutual exclusion without resorting to modifying memory (perhaps we could abuse SRAM like the trace command?).

For the bridge lock we should use flock(2) with LOCK_EX on something like /run/lock/culvert.${bridgename}. For the fingerprint lock we can use something like /run/lock/culvert.fingerprint (again with flock(2) and LOCK_EX).

amboar avatar Dec 14 '23 22:12 amboar

actually, we could probably get away without the fingerprint lock, that's not actually achieving much as the operations are all reads.

amboar avatar Dec 15 '23 02:12 amboar