core-v-verif icon indicating copy to clipboard operation
core-v-verif copied to clipboard

Accessing FP CSRs crashes the debug unit

Open jeremybennett opened this issue 2 years ago • 10 comments

At the request of @MikeOpenHWGroup this is a duplicate of core-v-mcu issue #159

Accessing FP CSRs crashes the debug unit

Type

Indicate whether the type of problem you found:

  • Other: Although possibly technically valid behavior, locking up the Debug Unit is not a helpful outcome.

Steps to Reproduce

Please provide:

First build verilator model library from the core-v-mcu repository. Note: Requires using Verilator 4.200 or later (the example here was using 4.203).

  1. Checkout commit 145a458
git checkout 145a458
  1. Build using fusesoc
make model-lib

The Verilator model driver is in @jeremybennett fork of the embdebug-target-core-v repository repository. Build as follows (Linux instructions)

  1. Check out the jpb-fpu-csr-access-issue branch and switch to the target directory
git checkout jpb-fpu-csr-access-issue
  1. Edit the Makefile so MCU_DIR points points to your checkout of the core-v-mcu repository.
  2. Build the testbench
make
  1. Run the testbench
./testbench.exe

Note how the cycle, instret CSRs and PC (also a CSR) are read successfully. Then a non-existent FPU CSR is read. The cycle, instret CSRs and PC now all return the last value read successfully (the previous read of the PC).

$ ./testbench.exe 
Timescale 1ns / 1ns

<output about various system state>

cycle = 0x000013dc
instret = 0x0000051b
PC = 0x1a0001bc

fflags = 0x1a0001bc
cycle = 0x1a0001bc
instret = 0x1a0001bc
PC = 0x1a0001bc
  1. If it is helpful, rerun generating a VCD of the model
./testbench.exe --vcd=example.vcd

(use a VCD filename of your choice), which will create a VCD in the directory where the program is run.

jeremybennett avatar Aug 19 '21 13:08 jeremybennett

I have looked further, and the issue is that accessing the FPU CSRs never completes. When you look at the cmderr field of abstractcs, it reports 1 (busy), indicating the attempt to read the CSR is still in progress. It never leaves this state.

jeremybennett avatar Nov 04 '21 19:11 jeremybennett

Can you boil that down into a single asm instruction? Perhaps:

csrrs x1, fcsr x0

MikeOpenHWGroup avatar Nov 04 '21 19:11 MikeOpenHWGroup

Can you boil that down into a single asm instruction? Perhaps:

csrrs x1, fcsr x0

It isn't accessing the CSR that is the problem, it is accessing the CSR via the debug unit that is the problem.

jeremybennett avatar Nov 10 '21 16:11 jeremybennett

I've tried to reset the debug unit when this occurs.

The recommended approach to "unsticking" the debug unit is to first reset the hart using ndmreset or hartreset, then resetting the debug unit using dmactive. Using hartreset is not an option, since it is not implemented for the PULP debug unit.

Using ndmreset (toggle high, then low) followed by dmactive (toggle low then high) has no effect. The debug unit remains stuck.

jeremybennett avatar Nov 10 '21 16:11 jeremybennett

This may or may not be related, but as I understand it, you are attempting to access a non-existant CSR while the core is in debug mode. Assuming that this qualified as a debug exception, the core will jump to address dm_exception_addr_i[31:0], which is a primary input to the core, which is currently hardwired to 0x0 in the MCU (link).

What does the BSP expect to happen when an exception occurs when the core is in debug mode?

MikeOpenHWGroup avatar Nov 10 '21 18:11 MikeOpenHWGroup

dm_exception_addr_i[31:0], which is a primary input to the core, which is currently hardwired to 0x0 in the MCU (link).

That is not correct; it should point to the appropriate address in the Debug Module.

Silabs-ArjanB avatar Nov 11 '21 08:11 Silabs-ArjanB

Resurrecting this discussion...

dm_exception_addr_i[31:0], which is a primary input to the core, which is currently hardwired to 0x0 in the MCU (link).

That is not correct; it should point to the appropriate address in the Debug Module.

This was resolved in CORE-V-MCU Issue #194:

Fixed - dm_execption address set to 0x1a11080c as specified in dm module rom code.

Having said that, we have not confirmed that this issue was investigated at the CV32E40P core level, and I do not see any attempt to access floating point CSRs in any of the debug_test test-programs. Investigating...

MikeOpenHWGroup avatar Dec 20 '23 17:12 MikeOpenHWGroup

I'll try to answer to the question of accessing FCSR in debug routines while in debug mode. It depends ...

If you are using CV32E40Pv1 or CV32E40Pv2 without FPU, accessing FPU CSR of FREGS will directly jump to debug exception handler at dm_exception_addr_i. If you are using CV32E40Pv2 in RISC-V F configuration, accessing FPU CSR or FREGS will do the same if MSTATUS.FS = OFF. But if MSTATUS.FS != OFF then accesses are correctly done. Last when using CV32E40Pv2 in RISC-V Zfinx configuration, accessing FPU CSR (No FREGS) will always be fine as MSTATUS.FS is not implemented.

Related to those FCSR/FREGS accesses there is a gcc issue on interrupt routine preamble generation which doesn't look at MSTATUS.FS and creates deadlock situation in some cases when interrupts are received.

And to answer to the last question of Mike, yes we do execute F state (FCSR & FREGS) save/restore in debug program code as part of CV32E40Pv2 verification test plan. And we have additional debug tests where some F computation is actually done in debug routines to test that nothing prevents it to be done as in normal execution mode.

pascalgouedo avatar Dec 21 '23 12:12 pascalgouedo

Thanks for this Pascal. I knew most, but not all of this. Very helpful. A few comments/questions:

If you are using CV32E40Pv1 or CV32E40Pv2 without FPU, accessing FPU CSR of FREGS will directly jump to debug exception handler at dm_exception_addr_i.

This is a surprise. I would have thought that accessing FPU CSRs or FREGS when FPU==0 or MSTATUS.FS = OFF would result in an illegal instruction, not a trap to debug-mode.

we do execute F state (FCSR & FREGS) save/restore in debug program code

Can you point me at the right test-program for this?

MikeOpenHWGroup avatar Dec 21 '23 13:12 MikeOpenHWGroup

This is a surprise. I would have thought that accessing FPU CSRs or FREGS when FPU==0 or MSTATUS.FS = OFF would result in an illegal instruction, not a trap to debug-mode.

This is what I meant, jump to exception handler specific to debug (dm_exception_addr_i) as this happens during debug program execution while in debug mode.

Can you point me at the right test-program for this?

corev_rand_pulp_instr_debug

pascalgouedo avatar Dec 21 '23 16:12 pascalgouedo