Program is halted when running Coremark.riscv
Hello
I want to run Coremark on the simulator and I used the code here: riscv-coremark which generates two version one for baremetal and the other for Linux or pk. I compiled them and copied them in the built benchmarks folder.
I used the following command:
sims -sys=manycore -vlt_run -x_tiles=1 -y_tiles=1 coremark.bare.riscv -ariane -precompiled -rtl_timeout=1000000
for the bare metal th execution is halted before the rtl-timeout as followwing:
`TILE0-------------------------------------
0000000001d3904a
0000000001d3904a P1S3 msg type: st_req addr: 0x0080001000, Data_size: 100, cache_type: 0
P1S3 valid: recycle: 0, stall: 0
State wr en: 1
Dir data: 0x0000000000000000
CSM enable: 0
Msg from mshr: 1
P1S3 addr: 0x0080001000
P1S3 valid: l2_hit: 1, l2_evict: 0
Data data: 0x000000000000000000000000000000000000
State:mesi: 00, vd: 10, subline: 0000, cache_type: 0, owner: 000000
sdid: 0, lsid: 0
TILE0-------------------------------------
0000000001d3923e
0000000001d3923e P1S4 msg type: st_req addr: 0x0080001000, Data_size: 100, cache_type: 0
P1S4 valid: recycle: 0, stall: 0, msg_stall: 0, dir_data_stall: 0, stall_inv_counter: 0, stall_smc_buf: 0, smc_stall: 0, global_stall: 0, broadcast_stall: 0
Control signals: 0100011101000011110
CSM enable: 0
broadcast coreid: ( 0, 0, 0)
broadcast state: 0, broadcast op val: 0
Special addr type: 0
MSHR state wr en: 0
MSHR data wr en: 0
MSHR data wr : 0x00000000000000040a00c0080001000
MSHR inv counter : 0
State wr en: 1
Dir data: 0x0000000000000000
Dir sharer counter: 1
State data in: 0x01000000000004040
State data mask in: 0x0f0000000000067ff
State wr addr: 0x40
Msg data: 0x0000000000000000
SMC miss: 0
SMC data out: 0x00000000
SMC tag out: 0x0000
SMC valid out: 0x0
Msg send valid: 1, send ready: 1, mode: 011, length: 00000010
Msg send type: data_ack Msg send data_size: 000, cache_type: 0, mesi: 11, l2_miss: 1, mshrid: 00000011, subline_vector: 0000
Msg from mshr: 1
P1S4 addr: 0x0080001000
P1S4 valid: l2_hit: 1, l2_evict: 0
Data data: 0x000000000000000000000000000000000000
State:mesi: 00, vd: 10, subline: 0000, cache_type: 0, owner: 000000
Msg send: addr: 0x0080001000, dst_x: 00000000, dst_y: 00000000, dst_fbits: 0000
Msg send data: 0x00000000000000000000000000000000
src x: 00000000, src y: 00000000
sdid: 0, lsid: 0
0000000001d39432
TILE0 noc2 flit raw: 0x00000000008740f8
0000000001d39626
TILE0 noc2 flit raw: 0x0000000000000000
0000000001d3981a
TILE0 noc2 flit raw: 0x0000000000000000
30646000 TILE0 L1.5: Received NOC2 MSG_TYPE_DATA_ACK mshrid 3, l2miss 1, f4b 0, ackstate 3, address 0x0000000000
Data1: 0x0000000000000000
Data2: 0x0000000000000000
Data3: 0x0000000000000000
Data4: 0x0000000000000000
0000000001d39ef0 L15 TILE0:
NoC1 credit: 8
NoC1 reserved credit: 1
TILE0 Pipeline: * X X
Stage 1 status: Operation: L15_REQTYPE_ACKDT_ST_IM
TILE0 S1 Address: 0x0080001000
L15_MON_END
0000000001d3a0e4 L15 TILE0:
NoC1 credit: 7
NoC1 reserved credit: 0
TILE0 Pipeline: * * X
Stage 1 status: Operation: L15_REQTYPE_ACKDT_ST_IM
TILE0 S1 Address: 0x0080001000
Stage 2 status: Operation: L15_REQTYPE_ACKDT_ST_IM
TILE0 S2 Address: 0x0080001000
TILE0 S2 Cache index: 0
DTAG way0 state: 0x3
DTAG way0 data: 0x0000000080024000
DTAG way1 state: 0x2
DTAG way1 data: 0x0000000080003800
DTAG way2 state: 0x3
DTAG way2 data: 0x0000000080004000
DTAG way3 state: 0x0
DTAG way3 data: 0x0000000000000000
L15_MON_END
0000000001d3a2d8 L15 TILE0:
NoC1 credit: 7
NoC1 reserved credit: 0
TILE0 Pipeline: X * *
Stage 2 status: Operation: L15_REQTYPE_ACKDT_ST_IM
TILE0 S2 Address: 0x0080001000
TILE0 S2 Cache index: 0
MESI write way: 0x3
MESI write data: 0x3
HMT writing: 0
Stage 3 status: Operation: L15_REQTYPE_ACKDT_ST_IM
TILE0 S3 Address: 0x0080001000
TILE0 WMT read index: 00
WMT way 0: 1 0x1
WMT way 1: 0 0x0
WMT way 2: 0 0x0
WMT way 3: 0 0x0
L15_MON_END
30647500 TILE0 L1.5 th0: Sent CPX ST_ACK l2miss 1, nc 0, atomic 0, threadid 0, pf 0, f4b 0, iia 0, dia 0, dinval 0, iinval 0, invalway 0, blkinit 0
Data0: 0x0000000000000000
Data1: 0x0000000000000000
Data2: 0x0000000000000000
Data3: 0x0000000000000000
0000000001d3a4cc L15 TILE0:
NoC1 credit: 8
NoC1 reserved credit: 0
TILE0 Pipeline: X X *
Stage 3 status: Operation: L15_REQTYPE_ACKDT_ST_IM
TILE0 S3 Address: 0x0080001000
L15_MON_END
Info: spc(0) thread(1) -> timeout happen
Info: spc(0) thread(2) -> timeout happen
Info: spc(0) thread(3) -> timeout happen
Info: spc(0) thread(1) -> timeout happen
...
and I tried the other one for Linux and pk (just to check) and it keeps running for ever until reaching the timeout. Could you please help?
Looking at the code, this version assumes that it has some kind of test harness that doesn't exist in our environment. You'd have to modify the compilation environment to instead use the syscalls.c, crt.S, etc that we have in the OpenPiton+Ariane environment. I don't think it should be too troublesome to do that but it'd take some tinkering
I could get it to build by copying $PITON_ROOT/piton/verif/diag/assembly/include/riscv/ariane/* into the riscv-coremark/riscv64-baremetal/ directory and then modifying the gcc build command to add -fno-builtin-printf. However, I get bad trap when it runs. Not sure what the issue there is.
Looking at trace_hart_0.log it seems there's an illegal instruction exception.
Looks like the issue is on rdcycle - there is a discussion of this on the PULP forum here: https://pulp-platform.org/community/showthread.php?tid=133
Based on the above post, I think that essentially coremark assumes it's running in some kind of user mode environment with the ability to use rdcycle. Ariane doesn't seem to have the mcounteren/scounteren register for you to enable lower-privilege (like user-mode) access to the register.
As Florian and Frank say in the post above, you can do one of two things:
-
Expand the trap handling to enable access to the register (should be a small software change). You could just add an if statement in handle_trap to check the cause. If it's trying to do a rdcycle, perform the rdcycle there since you're in machine mode then, and put the value in the right place. If it's not a rdcycle, then you can just exit as it does already. See handle_trap here: https://github.com/PrincetonUniversity/openpiton/blob/659e115016d3f9f570dc976dfe1514d1d8db504d/piton/verif/diag/assembly/include/riscv/ariane/syscalls.c#L110-L113
-
Implement the
mcounterenorscounterenregister in Ariane and modify the software to enable access in the beginning. This is more involved but would improve Ariane
Since this caught my attention and I've seen others complain of the same problem, I decided to help out the Ariane project and implement the registers (option 2). You can see my PR here: https://github.com/pulp-platform/ariane/pull/411
If you use this, follow my steps above, and add the following to our crt.S (not the one from riscv-coremark, which should be replaced):
@@ -110,6 +139,12 @@ _start:
la t0, trap_entry
csrw mtvec, t0
+ # initialize mcounteren and scounteren
+ # allow access from user mode
+ li a0, 0x7
+ csrw mcounteren, a0
+ csrw scounteren, a0
+
# initialize global pointer
.option push
.option norelax
then you will be able to run the bare-metal coremark. It will run for a very long time in simulation (particularly because there's lots of printing)! I added -DCORE_DEBUG to my gcc build command to generate the bare-metal version and run for only a single iteration. and it runs in a much more watchable timeframe. That's obviously not correct, but it does at least let us see that it's making progress as intended and not just getting stuck in an infinite loop.
Also: be careful with that version of coremark. Looking at the core_portme.* files it seems like they're choosing some random values for frequency and so on. You'll need to change those to make sure you're getting valid numbers.
Thank you Jonathan for you efforts to help me. I followed your instructions step by step and I also had a good trap with 1 iteration and as you mentioned the results are incorrect since CoreMark must run at least for 10 seconds and 1 iteration is not enough for that. So, I am trying to increase the number of iteration.But I already hit a Timeout Questions:
- Is there a possible way to prevent the Timeout?
- in the fake_uart log I got this :
2K performance run parameters for coremark.
CoreMark Size : 666
Total ticks : 527497
Total time (secs): %f
Iterations/Sec : %f
ERROR! Must execute for at least 10 secs for a valid result!
Iterations : 1
Compiler version : GCC8.2.0
Compiler flags : -O2 -mcmodel=medany -static -std=gnu99 -fno-builtin-printf -fno-common -nostdlib -nostartfiles -lm -lgcc -T ../riscv64-baremetal/link.ld
Memory location : Please put data memory location here
(e.g. code in flash, data on heap etc)
seedcrc : 0xe9f5
[0]crclist : 0xe714
[0]crcmatrix : 0x1fd7
[0]crcstate : 0x8e3a
[0]crcfinal : 0xe714
Errors detected
Do you think it is normal or the printf() has a problem to deal with floats? Thank you
You can set -rtl_timeout to a large number. I think it's in either micro or nanoseconds. In other circumstances I've set it to something around 1000000 to stave off the checks. I think there's another way to turn it off but I can't think of it immediately or which file exactly that checker was in. Increasing the number should hopefully be sufficient as we've just turned it up before to do Linux boots which can take a couple of days.
10 seconds is quite a long time to run for in simulation. I'd guess it'll take a few hours. It may be better to move to FPGA instead. The test should be able to be pitonstreamed and because you're using our syscalls.c etc, if you add -DPITONSTREAM when you compile the benchmark to .riscv it will also add the load to good/bad trap into the test that can be recognised on FPGA and show pass/fail there. Your bitfile may need to be modified to change the timeout there but I don't recall how that works. I think it may just be a software-based check in pitonstream on the host which you should be able to override without having to recompile the bitfile.
As for printf, yes. We have a very simple implementation of printf in the syscalls.c/util.h/etc which doesn't include floats because we hadn't needed them. You could add in an implementation from elsewhere or just take it easy and print as hex instead and use python on the command line for example to just quickly get the correct corresponding float value.