magma icon indicating copy to clipboard operation
magma copied to clipboard

Magma/CoreIR Simulator Performance Slow

Open David-Durst opened this issue 6 years ago • 4 comments

https://github.com/David-Durst/aetherling/blob/4cc822aa06eae30a01f539524b2b585d6f73f9bc/tests/haskell/test_downsampleStencil.py#L37

The above line of the test takes 3-5 minutes. However, the python file the test is loading the circuit from (https://github.com/David-Durst/aetherling/blob/4cc822aa06eae30a01f539524b2b585d6f73f9bc/tests/haskell/downsampleStencilChain1Per64.py) takes ~10 seconds to generate coreir when the following is appended to it:

magma.compile("downsampleStencilChain1Per64", downsampleStencilChain1Per64, output="verilog", passes=["rungenerators", "wireclocks-coreir", "verifyconnectivity --noclkrst", "flattentypes", "flatten", "verifyconnectivity --noclkrst", "deletedeadinstances"], namespaces=["aetherlinglib", "commonlib", "mantle", "coreir", "global"], context=c)

Why is it taking so long for the simulator to get itself ready? Does switching to fault fix this?

David-Durst avatar Feb 13 '19 21:02 David-Durst

Here is CProfile trace of the unit test https://github.com/David-Durst/aetherling/blob/238811583c2cfc8225771f6c5d80cddda3afcea4/tests/haskell/test_downsampleStencil.py#L37.

aetherling_downsample_stencil_chain_performance_log.zip

It takes 7 mins and 47 seconds on a 2017 MBP with a 3.1 GHZ I7 and nothing else running that is taking a significant amount of CPU. Immediately after the test, CPU is >80% idle.

CoreIR compilation and simulation took more than 94% of the time, as you can see from: screen shot 2019-02-14 at 9 37 55 pm

David-Durst avatar Feb 15 '19 05:02 David-Durst

In comparison, I am running the much larger tests at https://github.com/David-Durst/aetherling/blob/master/tests/haskell/test_downsampleStencil_big.py in 9 mins and 31 s using fault and verilator. (Note: you need to use https://github.com/David-Durst/fault/tree/aetherling_debugging to run these verilator tests as I made some custom printing modifications to verilator)

That is 7 tests in verilator, each with a 256 times larger input image, that all together run in roughly the same time as the 1 test in CoreIR simulator.

David-Durst avatar Feb 17 '19 00:02 David-Durst

@David-Durst I don't know why the passes that prepare the coreir circuit take a long time, though @rdaly525 may have some more insight on that.

As for the performance of the simulator: the CoreIR simulator is an interpreter, so it is going to be much slower than verilator, which is a compiled-code simulator.

Verilator typically runs at least 100x faster than existing, well established interpreters like Icarus Verilog. If you really need high performance circuit simulation you are going to have to use a compiled code simulator.

dillonhuff avatar Feb 17 '19 06:02 dillonhuff

@shacklettbp here's one issue related to the coreir simulator performance, let us know if you need help getting booted up with the requirements to reproduce the issue

leonardt avatar Feb 27 '19 22:02 leonardt