cvw icon indicating copy to clipboard operation
cvw copied to clipboard

Some SV48 bins are missing coverage (might be some bug in wallyTracer)

Open Zain2050 opened this issue 8 months ago • 1 comments

I have developed SV48 tests, but some bins related to faults on instruction fetch don't show coverage. For example, sv48_reserved_rwx_pte_S_mode.S tests for faults in case of reserved RWX for levels 0-3, but gives the following coverage

    Cross PTE_res_rwx_s_i_exec                                    
            bin <leaflvl_noexec_s,kilo,sv48,ins_page_fault,set>        1          -    Covered 
            bin <leaflvl_exec_s,kilo,sv48,ins_page_fault,set>          1          -    Covered 
            bin <leaflvl_exec_s,giga,sv48,ins_page_fault,set>          1          -    Covered 
            bin <leaflvl_exec_s,mega,sv48,ins_page_fault,set>          1          -    Covered  
            bin <leaflvl_noexec_s,tera,sv48,ins_page_fault,set>        1          -    Covered  
            bin <leaflvl_exec_s,tera,sv48,ins_page_fault,set> 
            bin <leaflvl_noexec_s,giga,sv48,*,*>            0          1          1    ZERO                 
            bin <leaflvl_noexec_s,mega,sv48,*,*>            0          1          1    ZERO  

The same permissions are being checked for all the levels, but giga and mega pages are lacking coverage. The tests executes in the sequence of tera, giga, mega and kilo. If I change it to giga, tera, mega and kilo, the coverage becomes

    Cross PTE_res_rwx_s_i_exec         
            bin <leaflvl_noexec_s,kilo,sv48,ins_page_fault,set>        1          -    Covered 
            bin <leaflvl_exec_s,kilo,sv48,ins_page_fault,set>          1          -    Covered 
            bin <leaflvl_noexec_s,giga,sv48,ins_page_fault,set>        1          -    Covered 
            bin <leaflvl_exec_s,giga,sv48,ins_page_fault,set>          1          -    Covered 
            bin <leaflvl_exec_s,mega,sv48,ins_page_fault,set>          1          -    Covered 
            bin <leaflvl_exec_s,tera,sv48,ins_page_fault,set>          1          -    Covered 
            bin <leaflvl_noexec_s,mega,*,*,*>               0          1          2    ZERO                 
            bin <leaflvl_noexec_s,tera,*,*,*>               0          1          1    ZERO  

The sort of weird behavior occured while we were working on SV32, which Huda Sajjad fixed (it was some issue in the wallyTracer.sv). Some other coverpoints are showing similar behavior. Like sv48_canonical_S_mode.S is generating the following report

    Cross sv48_canonical_exec_s       
            bin <leaflvl_s,giga,not_zero_and_not_all_ones,sv48,ins_page_fault,set>    Covered   
            bin <*,kilo,*,*,*,*>                            0          1          1    ZERO                 
            bin <*,mega,*,*,*,*>                            0          1          1    ZERO                 
            bin <*,tera,*,*,*,*>                            0          1          1    ZERO    

All these tests are placed on my fork https://github.com/Zain2050/riscv-arch-test/tree/sv48/riscv-test-suite/rv64i_m/vm_sv48/src.

Zain2050 avatar May 17 '25 11:05 Zain2050

The tests are working fine on Sail. I also used lockstepverbose and $display to observe the signals. Lockstepverbose shows that an instruction fetch exception occured, but the signals for the missing bins are not correct. For example, the missing bins show a physical address of 235 which is not even in our address range. This lead to me to think that the tracer isn't propagating the signals correctly.

Zain2050 avatar May 17 '25 11:05 Zain2050

@rosethompson I looked at your recommended changes in WallyTracer. The only one that worked was changing SelHPTW to ~GatedStallW only for the memory stage. It fixed the issue for sv48, but the coverage dropped for sv32. This seemed very strange, therefore I looked at its waveform to observe what's actually going on. Turn out there's an extra TLB miss for the missing bins.

Image

The miss is supposed to be on fetch only (JALR), but we're getting another miss prior to it.\

Image

And then after it we get some stalls and the correct TLB miss for JALR.

Image

I don't know why we're getting two misses. lockstepverbose is showing only one miss and only one tlb entry being created. For SV48, we get correct PTE, VA & PA values in the latter page table walk, while for SV32, we are getting correct values in the prior walk and wrong values in the second one. That's why tracer is only working for either RV32 or RV64. Current tracer is grabbing the previous page table walk values, (working for sv32), while the new changes are grabbing the latter page table walk values (working for sv48).

Either we can add conditionals in the tracer to keep it different for rv32 and rv64. Or I came up with another solution. We can do this for memory stages flopenrc #(P.XLEN) IVAdrWReg (clk, reset, 1'b0, SelHPTW | (FlushM & ~FlushW), IVAdrM, IVAdrW); In this scenerio were getting few cycles with the previous values overlapping with correct Mcause value and afterwards it changes to new page table walk values. Therefore, it's working for both rv32 and rv64.

Image

@rosethompson What do you say about this?

Zain2050 avatar Jun 24 '25 08:06 Zain2050

@Zain2050 can you post the .elf that causes the two TLB misses? You'll probably have to gzip it first.

davidharrishmc avatar Jun 26 '25 14:06 davidharrishmc

@Zain2050 Can you include the elf so I can reproduce the bug? I bet I can debug this really fast if I have the elf.

rosethompson avatar Jun 26 '25 14:06 rosethompson

Fixed with PR #1497

Zain2050 avatar Aug 07 '25 10:08 Zain2050