AXI4 icon indicating copy to clipboard operation
AXI4 copied to clipboard

Write data before address is lost

Open rhinton opened this issue 5 months ago • 3 comments

UPDATE: When I changed DEFAULT_DELAY => 1 PS, the problem was resolved. If a delay of 0 is not supported, I suggest an assert (at least) in Axi4MemoryVTI.vhd.

I'm using Axi4/src/Axi4MemoryVTI.vhd. When I enable SetUseRandomDelays(TransRec, TRUE) in my test bench, I see that the memory component accepts the write burst data (AxiBus.WriteData.Ready = '1') before it accepts the write burst data (AxiBus.WriteAddress.Ready). In this case, the data for the next write transaction is stored to this address. Note I also have DEFAULT_DELAY => 0 ps.

Here is a waveform using my code's AXI bus structure, sorry. The cursor is at the point where the write address is accepted. When I subsequently read from address 0x012, I get 0xECE3 (first data from next transaction, lower right) instead of 0x75C8 (first data from matching transaction, lower left). Image

Following is the OSVVM transcript for this waveform. I see two "Write Address" transactions at 64 ns and 66 ns. (My clock period is 2 ns.)

%%     56 ns    Log    DEBUG     in u_mem,                   Write Data.  WData: 75C8  WStrb: 11  Operation# 1
%%     58 ns    Log    DEBUG     in u_mem,                   Write Data.  WData: D6CD  WStrb: 11  Operation# 2
%%     64 ns    Log    DEBUG     in u_mem,                   Write Address.  AWAddr: 012  AWProt: 000  AWLen: 00000001  AWSize: 001  AWBurst: 01  AWID: 00  AWUser: 0  Operation# 1
%%     64 ns    Log    INFO      in u_mem,                   Memory Write.  AWAddr: 012  AWProt: 000  WData: 75C8  WStrb: 11  Operation# 0
%%     66 ns    Log    DEBUG     in u_mem,                   Write Address.  AWAddr: 012  AWProt: 000  AWLen: 00000001  AWSize: 001  AWBurst: 01  AWID: 00  AWUser: 0  Operation# 2
%%     72 ns    Log    DEBUG     in u_mem,                   Write Response.  BResp: 0  BID: 0  BUser: 0  Operation# 1
%%     80 ns    Log    DEBUG     in u_mem,                   Write Data.  WData: ECE3  WStrb: 11  Operation# 3
%%     80 ns    Log    DEBUG     in u_mem,                   Write Address.  AWAddr: 051  AWProt: 000  AWLen: 00000110  AWSize: 001  AWBurst: 01  AWID: 00  AWUser: 0  Operation# 3
%%     80 ns    Log    INFO      in u_mem,                   Memory Write.  AWAddr: 012  AWProt: 000  WData: ECE3  WStrb: 11  Operation# 1
%%     82 ns    Log    DEBUG     in u_mem,                   Write Data.  WData: C598  WStrb: 11  Operation# 4
%%     82 ns    Log    DEBUG     in u_mem,                   Write Response.  BResp: 0  BID: 0  BUser: 0  Operation# 2
%%     84 ns    Log    DEBUG     in u_mem,                   Write Data.  WData: ADDF  WStrb: 11  Operation# 5
%%     84 ns    Log    INFO      in u_mem,                   Memory Write.  AWAddr: 051  AWProt: 000  WData: ADDF  WStrb: 11  Operation# 2

rhinton avatar Aug 10 '25 01:08 rhinton

I found it. Currently if the delay is 0, in the procedure DoAxiReadyHandshake, there is a sequence of code that

    if Valid = '1' then 
      -- Proper handling
      if not ReadyBeforeValid then 
        Ready <= '1' after ReadyDelayCycles + tpd_Clk_Ready ;
      end if ; 
      
      -- If ready not signaled yet, find ready at a rising edge of clk
      if Ready /= '1' then
        wait on Clk until Clk = '1' and (Ready = '1' or Valid /= '1') ;
        AlertIf(
          AlertLogID, 
          Valid /= '1', 
          TimeOutMessage & 
          " Valid (" & to_string(Valid) & ") " & 
          "deasserted before Ready asserted (" & to_string(Ready) & ") ",
          FAILURE
        ) ;
      end if ; 
    else 
      -- TimeOut handling
      Alert(
        AlertLogID, 
        TimeOutMessage & " Valid: " & to_string(Valid) & "  Expected: 1",
        FAILURE
      ) ;
    end if ; 
    
    -- End of operation
    Ready <= '0' after tpd_Clk_Ready ;

So when ReadyBeforeValid is FALSE, tpd_Clk_Ready = 0 ps, ReadyDelayCycles = 2 * tperiod_Clk, the code execution path is:

        Ready <= '1' after ReadyDelayCycles + tpd_Clk_Ready ;
        wait on Clk until Clk = '1' and (Ready = '1' or Valid /= '1') ;
        Ready <= '0' after tpd_Clk_Ready ;

As a result, this projects a waveform on to Ready for it to change to '1' at delta cycle 0 of in 2 clocks from now. When Clock rises at this time, Ready just became '1', the wait wakes up and immediately assigns Ready back to 0.

There are a number of potential fixes:

  1. Make sure no one sets the delay to 0 ns.
  2. Change the wait statement to: wait on Clk until Clk = '1' and ((Ready = '1' and Ready'last_event > 0 sec) or Valid /= '1')
  3. Reassign the generics to internal constants: constant tpd_Clk_Ready_norm : time := maximum(tpd_Clk_Ready, std.env.resolution_limit) ;
  4. Change the assignment to Ready to WaitForClock(Clk, ReadyDelayCycles) ; -- Make ReadyDelayCycles an integer Ready <= '1' after tpd_Clk_Ready ;

Analysis:

  1. Given that two people have already set the delay to 0, I doubt even if I document it that I can prevent this from happening.
  2. This gives me concern about the overhead of Ready'last_event > 0 sec
  3. This gives me concern that it must be tediously done for every AXI VC. It would be easy to miss one.
  4. I think I like this. While it probably has more overhead than solution 2, it leads to stability. tperiod_Clk no longer needs to be entirely accurate as it is just used for the timeout values, hence, tperiod_Clk could be measured in the VC rather than specified as a generic. This will require an interface change to DoAxiReadyHandshake which can at least temporarily be handled by overloading.

JimLewis avatar Aug 12 '25 20:08 JimLewis

Good find!

rhinton avatar Aug 13 '25 19:08 rhinton

I'm not a fan of enforced delays - independent of 1 ns, 1 ps or 1 fs - converting behavioral simulations to "gate" simulations. Aa a result, all checks in a testbench would need to be changed to a timing aware check. Simple checks at rising_edge() aren't possible anymore.

AMD does it in some of there primitives like BlockRAMs or DSPs. It opens just a can of worms and causes many many problems in simulation as well as it increases overall testbench complexity.

Paebbels avatar Aug 15 '25 05:08 Paebbels