rv32emu
rv32emu copied to clipboard
Implement instruction usage histogram
With the ability to record and print histograms, we can observe instruction frequency and print. Sample output:
instruction usage histogram
~~~~~~~~~~~~~~~~~~~~~~~~~~~
1. lw 16.37% [843055543] #######################################
2. xor 13.85% [713031972] ################################
3. add 13.69% [704643870] ################################
4. slli 13.20% [679477645] ###############################
5. srliw 10.75% [553648296] #########################
6. andi 9.94% [511705332] #######################
7. srli 3.05% [157286473] #######
8. lbu 2.93% [150995155] ######
9. addi 2.69% [138412722] ######
10. addiw 2.44% [125829177] #####
11. sd 1.79% [92275184 ] ####
12. sb 1.47% [75497501 ] ###
13. jal 1.14% [58720476 ] ##
14. beq 1.06% [54526059 ] ##
15. ld 0.98% [50332182 ] ##
16. and 0.98% [50331715 ] ##
17. slliw 0.98% [50331682 ] ##
18. bne 0.90% [46137534 ] ##
19. or 0.65% [33554442 ] #
20. auipc 0.41% [20971539 ]
21. lui 0.24% [12583002 ]
22. mulw 0.16% [8388608 ]
23. lwu 0.16% [8388608 ]
24. jalr 0.08% [4194443 ]
25. sraiw 0.08% [4194314 ]
26. sw 0.00% [213 ]
27. bltu 0.00% [78 ]
28. bge 0.00% [39 ]
29. blt 0.00% [33 ]
30. bgeu 0.00% [33 ]
31. sub 0.00% [29 ]
...
Reference:
- rv8 : Generate instructions Histogram via "rv-bin histogram"
The author of rv8 showed a neat technique to render histogram. See https://github.com/microsoft/mimalloc/pull/529
register usage histogram generated by rv8.
I imported the LRU cache in this commit, so maybe we can display cache information in the profiling tool, such as cache size, cache hit rate, cache miss rate, cache hit times of a specific basic block, and the PC of the basic block. For example,
Total cache access: xxx
Cache miss rate: xx.x%
Cache hit rate: xx.x%
Block PC | hit times | instruction in this block |
--------- | ------------ | ---------------------------- |
0x4 | xxx [xx.x%] | LW ADD BEQ |
.
.
.
Check the way how mimalloc displays the internal information depending on environment variable MIMALLOC_VERBOSE
and MIMALLOC_SHOW_STATS
. Of course, the feature can be turned off via build-time flags.
It is feasible to reuse existing map
for LRU cache implementation. See https://jaeyu.wordpress.com/2014/04/15/lru-cache-in-c/
It is feasible to reuse existing
map
for LRU cache implementation. See https://jaeyu.wordpress.com/2014/04/15/lru-cache-in-c/
I try to integrate the existing map for LRU cache implementation, but the performance is worser than original version of LRU cache.
Performance of running CoreMarking
Mircoprocessor: Core i7-8700, Compiiler: gcc-12
- LRU cache with map: 464.427307 (Iterations/Sec)
- LRU cache without map: 978.822116 (Iterations/Sec)
I imported the LRU cache in this commit,
The memory management concerns on basic block should appear in #105 . Here, we still work on the statistics.
rv64_emualtor comes with an ELF file instruction frequency analyzer. See rv_analyzer, which looks quite straightforward.