Opt-out from world step profiling
On macOS, calls to mach_absolute_time take about 40% of the (headless) simulation time. Based on a brief investigation, I assume that this call is mostly related to filling the b2Profile struct. Therefore, avoiding profiling could save almost half of the CPU runtime. I also assume that other platforms will have similar performance gains.
This normally has no overhead. Not an issue on other platforms. Are you going through Rosetta?
@erincatto maybe it become more critical on simple enough worlds. I am really sure that Rosetta is not in case (binary is native arm64, statically linked with dependencies, RelWithDebInfo).
BTW I simply rewrote mentioned function to just return 0 and actually saved ~30% CPU runtime, from 5.8 to 4 seconds for 2M iterations. Looks weird in scale if I need to run thousands of them in parallel...
I may try to simplify and publish my benchmark so you may reproduce and analyze my results, though.
Another user reported this and I already investigated in the benchmark app. Profiling didn't slow it down on an M2 MacBook Air. If you think about it, why would a function designed for performance profiling be slow?
I recommend to make a separate program and prove that this function is slow.
I reviewed usage of b2GetTicks and it is not called in any inner loops.
Okay, reading this again, it sounds like you are running many worlds in parallel. Perhaps mach_absolute_time doesn't like to be called from multiple threads simultaneously. I'm not really optimizing the case of many small worlds in Box2D. For example, there are no benchmarks for many small worlds.