magma
magma copied to clipboard
Magma/CoreIR Simulator Performance Slow
https://github.com/David-Durst/aetherling/blob/4cc822aa06eae30a01f539524b2b585d6f73f9bc/tests/haskell/test_downsampleStencil.py#L37
The above line of the test takes 3-5 minutes. However, the python file the test is loading the circuit from (https://github.com/David-Durst/aetherling/blob/4cc822aa06eae30a01f539524b2b585d6f73f9bc/tests/haskell/downsampleStencilChain1Per64.py) takes ~10 seconds to generate coreir when the following is appended to it:
magma.compile("downsampleStencilChain1Per64", downsampleStencilChain1Per64, output="verilog", passes=["rungenerators", "wireclocks-coreir", "verifyconnectivity --noclkrst", "flattentypes", "flatten", "verifyconnectivity --noclkrst", "deletedeadinstances"], namespaces=["aetherlinglib", "commonlib", "mantle", "coreir", "global"], context=c)
Why is it taking so long for the simulator to get itself ready? Does switching to fault fix this?
Here is CProfile trace of the unit test https://github.com/David-Durst/aetherling/blob/238811583c2cfc8225771f6c5d80cddda3afcea4/tests/haskell/test_downsampleStencil.py#L37.
aetherling_downsample_stencil_chain_performance_log.zip
It takes 7 mins and 47 seconds on a 2017 MBP with a 3.1 GHZ I7 and nothing else running that is taking a significant amount of CPU. Immediately after the test, CPU is >80% idle.
CoreIR compilation and simulation took more than 94% of the time, as you can see from:

In comparison, I am running the much larger tests at https://github.com/David-Durst/aetherling/blob/master/tests/haskell/test_downsampleStencil_big.py in 9 mins and 31 s using fault and verilator. (Note: you need to use https://github.com/David-Durst/fault/tree/aetherling_debugging to run these verilator tests as I made some custom printing modifications to verilator)
That is 7 tests in verilator, each with a 256 times larger input image, that all together run in roughly the same time as the 1 test in CoreIR simulator.
@David-Durst I don't know why the passes that prepare the coreir circuit take a long time, though @rdaly525 may have some more insight on that.
As for the performance of the simulator: the CoreIR simulator is an interpreter, so it is going to be much slower than verilator, which is a compiled-code simulator.
Verilator typically runs at least 100x faster than existing, well established interpreters like Icarus Verilog. If you really need high performance circuit simulation you are going to have to use a compiled code simulator.
@shacklettbp here's one issue related to the coreir simulator performance, let us know if you need help getting booted up with the requirements to reproduce the issue