Mismatch between uarch spec and implementation for FwdRdAck/LOAD_FWDACK transition to Shared
Description:
According to the micro-arch (Section 3.2, Figure 5), the following transitions should occur when a FwdRdAck is received from the owner:
M → S
E → S
Reference Figure:
Currently, the RTL does not handle the M → S transition.
Verification Setup: 1- I wrote the following assertion based on the specification: as_state_m_s_load_fwd_ack: assert property( l2.pipe2.ctrl.valid_S2 && l2.pipe2.ctrl.msg_type_S2_f == LOAD_FWDACK && l2.pipe2.ctrl.l2_way_state_mesi_S2 == MODIFIED |-> l2.pipe2.ctrl.state_mesi_S2 == SHARED );
2- When running formal verification, the antecedent (conditions for firing) was covered, but the consequent (transition to SHARED) was never covered. 3- On inspecting the RTL, I found that there is no case defined for this transition in pipe2 of L2 cache.
Conclusion: When the current state is Modified and a LOAD_FWDACK is received, the next state should transition to Shared, consistent with the specification (Section 3.2, Figure 5). However, the M → S transition is not implemented in the RTL, and as a result, assertions based on the specification fail to cover this scenario. This causes the L2 state machine behavior to diverge from the documented specification and can potentially lead to incorrect coherence behavior when a cache line in M receives a LOAD_FWDACK. To resolve this, the L2 controller state transition logic should be updated to handle the M → S case, similar to the existing E → S transition.
Hi there! Thanks for taking a look at this. Could you tell me more about your methodology here? I want to be sure that we have a full understanding of both the specification and the RTL itself. Have you run any tests to validate that this is causing an issue? It may be that the state machines were not updated to reflect the implementation or that there is more complexity related to the use of writeback guards for example.
I compiled the L2 cache module to verify coherency behavior using a formal tool with property-based verification. In this setup, the NoC1 input is constrained so that only LOAD_REQ and STORE_REQ messages can appear for now. I also modeled NoC3 as a response generator that collects outgoing requests from NoC2 (such as LOAD_MEM or LOAD_FWD) and generates the corresponding responses back on NoC3. Additionally, I constrained that wb_req cannot appear in this environment. This setup allows me to focus on verifying the L2 coherence protocol through formal properties.
Within this environment, I wrote several SystemVerilog assertions covering different state transitions defined in the specification, focusing to compare the state diagram’s transitions with the actual RTL behavior in pipeline1 and pipeline2 of the L2. One of these assertions — for the LOAD_FWDACK message — is not covering the expected scenario described in Section 3.2, Figure 5. Specifically, the M → S transition does not occur in the RTL even though it is defined in the spec.
I manually inspected l2_pipe2’s state transition logic and confirmed that there’s no explicit handling for the LOAD_FWDACK case when the line is in the MODIFIED state. There is a defined case for E → S, but not for M → S. This observation aligns with my formal assertion results, where the antecedent was covered but the consequent (transition to SHARED) was never reached.
Ok so am I right in understanding that you haven't actually identified a correctness issue that manifests as a functional bug (with evidence of that bug)? Rather, you're using the specification to set up the formal properties and those don't match with the implementation? I have a feeling that it's going to be more likely that the transition just shouldn't exist in the specification. That or perhaps there is a performance bug that could be in place because the transition isn't implemented.
Yes, you’re right — I’m not currently checking for a direct functional bug. My focus right now is on verifying the micro-architectural behavior against the specified state transitions in the spec.
However, there are two possibilities here: 1- The specification might be outdated or incorrect, and the transition shouldn’t actually exist. 2- The specification is correct, but the transition hasn’t been properly implemented in RTL — which could potentially lead to a functional bug later on.
Specifically, according to the MESI protocol, a line in the Modified (M) state must transition to Shared (S) when a Load request is forwarded to the owner, and after receiving the acknowledgment, the state must update from M → S. If this transition isn’t implemented, it would indeed represent a functional bug in the coherence behavior.
Ok I see. There are multiple other considerations about the protocol that make it more complex than a simple MESI. For one, the L1.5 is likely sending a writeback guard on noc1 which may change which states are transitioned through or cause other effects (I haven't been fully clear on how this affects the pipe2 state, I just know the messages are necessary for ordering purposes). For another, (as far as I recall) the L1.5 itself effectively tracks MSI states rather than full MESI. I vaguely recall that the L1.5 may also be able to do silent downgrades. As a result, I'm not sure whether a simple monitoring of the transition like this is sufficient to observe these cases which could potentially obscure the M->S transition.
Well, I don’t think the silent downgrade explanation applies in this case. Silent transitions typically occur for E → M, not for M → S.
If the owner transitions from M → S but the directory isn’t updated accordingly, then the directory would still believe the line is in M and owned by another core. When a new request for the same block arrives, the lookup would still return “state = M, owner = X,” leading the L2 to incorrectly issue a LOAD_FWD instead of a DATA_ACK. This could result in a functional bug.
Hi, I have rechecked the modification case through simulation, and the overall behavior is working correctly. The earlier observation was not a functional bug but rather a mismatch between the implementation and the current specification.
The implementation uses the Exclusive state with the dirty bit set to represent the modification case, instead of using a separate Modified state. This behavior is verified in simulation.
I would like to request that the specification be updated accordingly to document this design choice and avoid confusion in the future.
Thank you!
I'm working on putting together RTD markdown sources for the docs that we can bring up to date. Will give you an update once I have something that seems more presentable and maybe you can make a suggestion for how we should reword once we have it? This seems like a case where the mapping just isn't "state == RTL reg" but rather there's a more complex mapping of how those relate, right? i.e. The dirty bit is effectively part of the state evaluation itself?
I really appreciate that you’re working on updating the specification. Once an initial version is ready, I’d be happy to review it and suggest clarifications wherever needed. I understand that the RTL does not use a single MESI state register that directly corresponds to the specification; instead, the effective coherence state is derived from a combination of fields. During my formal verification of coherence protocol, I observed that the L2 represents the “Modified” case as Exclusive + dirty bit = 1, rather than using a dedicated M state. Once the updated documentation is available, I’d be happy to review it and suggest how this behavior can be described more clearly.