VexRiscv
VexRiscv copied to clipboard
Incorrect Execution in Modelsim
Hi, I am having an issue with simulating the correct behavior in modelsim. It's possible that this is due to some configuration issue. Two issues I have noted.
- AUIPC followed by an add was not always executing correctly. This resulted in an incorrect address loaded into mtvec (I have removed this instruction as a workaround).
- Branches are erroneously executing after evaluated to taken. The branch in question was
bge #8,#7, offset
, which I confirmed was setting the execute_branch_cond signal to true. However, the processor continued executing the new few instructions, including a read from memory, a write to memory, and updating a register without restoring it.
Below is the configuration that I am using
object GenPE extends App{
def config = VexRiscvConfig(
plugins = List(
new IBusSimplePlugin(
resetVector = 0x80000000l,
cmdForkOnSecondStage = false,
cmdForkPersistence = false,
prediction = DYNAMIC,
catchAccessFault = false,
compressedGen = false
),
new DBusSimplePlugin(
catchAddressMisaligned = false,
catchAccessFault = false
),
new DecoderSimplePlugin(
catchIllegalInstruction = false
),
new RegFilePlugin(
regFileReadyKind = plugin.SYNC,
zeroBoot = false
),
new IntAluPlugin,
new SrcPlugin(
separatedAddSub = false,
executeInsertion = true
),
new FullBarrelShifterPlugin,
new HazardSimplePlugin(
bypassExecute = true,
bypassMemory = true,
bypassWriteBack = true,
bypassWriteBackBuffer = true,
pessimisticUseSrc = false,
pessimisticWriteRegFile = false,
pessimisticAddressMatch = false
),
new MulPlugin,
new DivPlugin,
new CsrPlugin(CsrPluginConfig.allPE(0x80000000l)),
new CompletionCsrPlugin,
new MemCpyCsrPlugin,
new DebugPlugin(ClockDomain.current.clone(reset = Bool().setName("debugReset"))),
new BranchPlugin(
earlyBranch = false,
catchAddressMisaligned = false
),
new YamlPlugin("cpu0.yaml")
)
)
CsrPluginConfig.allPE
is a slight modification:
def allPE(mtvecInit : BigInt) : CsrPluginConfig = CsrPluginConfig(
catchIllegalAccess = false,
mvendorid = 11,
marchid = 22,
mimpid = 33,
mhartid = 0,
misaExtensionsInit = 66,
misaAccess = CsrAccess.READ_WRITE,
mtvecAccess = CsrAccess.READ_WRITE,
mtvecInit = mtvecInit,
mepcAccess = CsrAccess.READ_WRITE,
mscratchGen = true,
mcauseAccess = CsrAccess.READ_WRITE,
mbadaddrAccess = CsrAccess.READ_WRITE,
mcycleAccess = CsrAccess.READ_WRITE,
minstretAccess = CsrAccess.READ_WRITE,
ecallGen = true,
wfiGenAsWait = true,
ucycleAccess = CsrAccess.READ_ONLY,
uinstretAccess = CsrAccess.READ_ONLY
)
And the CompletionCsrPlugin + MemCpyCsrPlugin are based on the customCSR, which exposes a CSR mapped register bus for interfacing with external hardware. Neither applies any modification to the pipeline - essentially a wrapper for in() and out().
Additional information: The processor is connected directly up to SRAM banks (a TCM implementation) which have a guaranteed single cycle read. As such, the following signals are static:
assign iBus_cmd_ready = 1'b1;
assign iBus_rsp_valid = 1'b1;
assign iBus_rsp_payload_error = 1'b0;
assign dBus_cmd_ready = 1'b1;
assign dBus_rsp_ready = 1'b1;
assign dBus_rsp_error = 1'b0;
The program memory is loaded via hex into a BRAM module, and executed directly. The relevant lines in question were:
default_interrupt_handler:
la a0, _launch_kernel
csrw CSR_MEPC, a0
mret
nop
nop
nop
_launch_kernel:
// load argc into t1
lw t1, 1*4(t0)
// load the count, because we can't branch on an immediate
addi t2, zero, 8
// set the new base ptr
addi t0, t0, 10*4
1: // if argc > 8, then we need to push them on the stack
bge t2, t1, 2f
lw t3, 0(t0) // load the argument
PUSH t3 // push it to the stack <-- this is a macro which decrements sp and then writes to sp
addi t0, t0, 4 // increment the argument pointer by 4
addi t2, t2, 1 // increment the count by 1
// unconditional branch back to the check condition
beq zero, zero, 1b
2:
// and now we can launch the kernel
call kernel
In the above code, a0
was loaded with the incorrect value - the instructions were encoded correctly. I since replaced those instructions with a simple j _launch_kernel
since I am not considering other exceptions / interrupts (bare metal).
With the loop... bge t2, t1, 2f
had the following values: t2 = 8, t1=7
, however, the lw
and sw
from the PUSH macro were both executed.
Please let me know if I can provide any further information.