sddf icon indicating copy to clipboard operation
sddf copied to clipboard

Add concurrency checking based on GenMC

Open KurtWu10 opened this issue 2 months ago • 1 comments

This PR introduces concurrency testing of the single-producer single-consumer queue in the network subsystem using the GenMC model checker. This enables lightweight and reliable CI testing of the queue's concurrency. To pass GenMC's verification, this PR also refactors the queue implementation.

  1. Testing is performed using a standard multi-threaded C program (ci/genmc/network/test1.c) that uses the queue and embeds functional correctness requirements. This allows the same program to be simulated by the GenMC tool or compiled and run on real hardware without significant modification.

    In the current test program, the queue is tested using busy loops, because the test only consider partial correctness of the queue. The test also does not consider deadlock-freedom of the queue. The test only covers the free queue currently, but not the active queue. Interactions with the signaling protocol are not considered in the PR.

    Tests for queues in other subsystems will be added in future PRs.

  2. The following changes were made to the queue implementation to pass GenMC's checks:

    • Certain accesses are explicitly labelled as atomic accesses, following a similar implementation in LionsOS. These atomic accesses use the release-acquire subset of the C memory model. Each access is annotated to indicate which other atomic access it synchronises with.

    • ~~The distinction of whether the queue will be run on SMP Microkit is removed, following https://github.com/au-ts/sddf/issues/511#issuecomment-3354157538.~~

      In a later revision of this PR, the distinction is retained due to the overhead from stlr (~8% on cpu utilisation on Maaxboard under full load). Compiler fences, whose semantic are similar to the pancake implementation, have been introduced explicitly.

    • The net_queue_length function is removed because It is not used.

    Functional correctness of the modified queue does not include synchronisations induced by the queue.

These changes may have both functional correctness and performance implications that should be addressed before merging:

  • On the correctness side, on AArch64, the compiled program now will usually use the ldar instruction instead of the dmb ish instruction on existing platforms. Since the former is usually weaker than the latter in the Arm memory model, it might reveal under-synchronisation issues in the system, because the GenMC testing only cover correctness of the queue itself.
  • On the performance side, ~~especially under the non-SMP setting, the atomic instruction may introduce overhead~~ there should be no performance overhead from the current implementation under the non-SMP setting. Under the SMP setting, it is claimed that the dmb st barrier is faster than stlr for store-store ordering, which should be validated on our platforms.

Finally, under the non-SMP pancake configuration, ~~the proposed solution introduces discrepancy in the queue implementation~~ the latest proposed solution remains implementable and existing code does not need to be modified. The proposed solution does not change the number of FFIs required under the SMP pancake configuration.

KurtWu10 avatar Oct 09 '25 06:10 KurtWu10

Made some minor fixes to make CI pass.

Ivan-Velickovic avatar Oct 09 '25 08:10 Ivan-Velickovic