SpinalHDL
SpinalHDL copied to clipboard
Provide some API to speed up SpinalSim hardware signal read
I just mesured the time needed to read a signal value using the verilator backend :
bt1.toLong => 40 ns
vs
manager.getLong(signal1) => 5 ns,
Redondant code avoided :
val manager = SimManagerContext.current.manager
val signal1 = manager.raw.userData.asInstanceOf[ArrayBuffer[Signal]](bt1.algoInt)
So, adding some API to provide a optimized signal access could realy help speeding up things for performance critical testbench
CPU used to test, AMD 5800X3D
Something like :
//Once in sim
val proxy = dut.mem.node.bus.a.address.simProxy()
..
//Many time in sim
val value = proxy.toLong // 5 ns overhead instead of 40
SimProxy being in spinal.core.sim package :
implicit class SimBitVectorPimper(bt: BitVector) {
class SimProxy(bt : BitVector){
val manager = SimManagerContext.current.manager
val signal = manager.raw.userData.asInstanceOf[ArrayBuffer[Signal]](bt.algoInt)
val alwaysZero = bt.getBitsWidth == 0
def toLong = if(alwaysZero) 0 else manager.getLong(signal)
}
def simProxy() = new SimProxy(bt)
}
I think the ClockDomain Apis such as waitSampling will speed up more if change like this
The threadfull API would not realy get faster, as most of the overhead there is into doing JVM thread pack / unpack and switching threads i guess.
I do a very simple by overwriting the waitSampling(),it can speed up form: test0:100000 times call 1092ms—>1027ms test1:1000000 times call 9868ms—>9112ms
Ahhh i was expecting less difference ^^
I just did the check with modifying forkStimulus of a ClockDomain, gives ~10% in a testbench that does not much apart from that clock...
I guess core stuff like the Stream/Flow drivers could also benefit from using a proxy...
OT: When looking at that stuff I checked the difference between using sleep in an endless loop vs. setting up repeated calls with delayed, that performance difference was pretty astounding: sleep is ~20 times slower...
Is it possible to modify the logic of toLong to avoid getting things too complicated?
How about use the scala-inline pulgin with @inline definition to speed up toLong method?
Is it possible to modify the logic of toLong to avoid getting things too complicated?
As far as I understand: It is currently as close as possible to the toLong method, but: the sim functionality is provided by e.g. the SimBitVectorPimper, and a new one of them is created for each implicit conversion that happens in the code (i.e. every use of a BitVector where you call a function from the Pimper).
I think there is no place to store the proxy persistently (and not have to deal with the ThreadLocal that is the bottleneck here (as far as I've understood).
@andreasWallner
OT: When looking at that stuff I checked the difference between using sleep in an endless loop vs. setting up repeated calls with delayed, that performance difference was pretty astounding: sleep is ~20 times slower...
Right, unfortunatly, the JVM doesn't provide any support for coroutine / user space thread, so the only way to implement that kind of feature was to use JVM threads and switch between them :/
The SpinalSim threaded API is realy slow compared to the callback based API
How about use the scala-inline pulgin with @inline definition to speed up toLong method?
Could be tired