OpenHBMC icon indicating copy to clipboard operation
OpenHBMC copied to clipboard

WRAP burst sometimes reads wrong data

Open jemk opened this issue 5 months ago • 1 comments

I sometimes see wrong data being read by WRAP bursts. The last 1-2 words of the burst (16 words) return data from an address 16 words higher than expected.

On the AXI bus it looks like this, the last word of the burst is delayed a lot (and contains wrong data): AXI signals

This happens because the burst towards the HyperRAM is terminated too early and another burst is started by the hbmc_ctrl state machine: OpenHBMC debug signals After the red trigger line, there are two more RWDS transitions with data != 0x0, but the hb_recov_data_vld (dbg_dru_valid in the screenshot) and hb_recov_data (dbg_dru_data) signals don't output them, because the DRU already got reset. A new burst is then started by the logic meant to continue long bursts after reaching the max CS low time, but since it isn't designed to handle WRAP-bursts, it uses the wrong address. (With INCR bursts this won't read wrong data I think, only reduce performance)

The reason for the early DRU reset seemed to be the different RWDS waveform at this sample, leading to hb_recov_data_vld going low, because the DRU needs another bit first to recover the data. When this happens close to the end of a burst, the ST_RD_8 state resets the DRU, even if there is more data to recover.

To verify this, I've added an additional state ST_RD_9 to ensure the state machine only advances if at least two cycles of data_vld are low. This leads to the following signals with a correctly read WRAP burst in a similar situation: OpenHBMC debug signals Notice the one low dru_valid cycle before the last words.

I'm not sure whether this would also be a good fix for this issue, but with this small change the Microblaze using OpenHBMC now ran three nights without any error, while before it crashed after 1-2 hours:

            ST_RD_8: begin
                if (~hb_recov_data_vld) begin
                    rd_state <= ST_RD_9;
                end
            end

            ST_RD_9: begin
                if (~hb_recov_data_vld) begin
                    dru_iserdes_rst <= 1'b1;
                    rd_state <= ST_RD_DONE;
                end else begin
                    rd_state <= ST_RD_8;
                end
            end

Tested with 166.6 MHz HyperRAM clock, BUFIO/BUFR mode, 100 MHz AXI clock, on a TE0725.

jemk avatar Sep 19 '24 08:09 jemk