box64 icon indicating copy to clipboard operation
box64 copied to clipboard

How to Count the Number of Executions of Each Dynablock in Box64?

Open wangguidong1999 opened this issue 1 year ago • 8 comments

Hello, I would like to contribute to Box64 project and would like to count the number of executions of each dynablock. Is there an existing method to achieve this? Alternatively, is it possible to add some logging in the code to track the execution count? Thank you for your help!

wangguidong1999 avatar Jul 17 '24 03:07 wangguidong1999

There is no such thing on box64. DynaRec generates code pretty fast so we don't really need hot code statistics. If you still want this for some reason, it should be straightforward to add a small header before every dynablocks to do the work.

ksco avatar Jul 17 '24 04:07 ksco

I thought about doing something like that before (mainly to track used block for deletion), using the Dynablock structure to stre those information and some prolog at each dynablock entry point (don't forget blocks can statically or dynamicaly jump to other block), but I didn't implement for multile reasons:

  1. It's a waste of cpu cycles
  2. handling CALLRET optim is a harder
  3. there are multithread issue, espeicaly for hardware without proper ATOMIC support, leading to even more spent cpu cycle and memory barriers
  4. For tracking if a blck is in use or not: it's also tricky to handle native calls that don't return (probably not an issue for tracking the execution count tho)

What would be the use of those counter?

ptitSeb avatar Jul 17 '24 06:07 ptitSeb

Thank you for your quick reply.

I would like to maintain a fixed-size counter group that only records the count of recently accessed blocks. If a block's execution frequency surpasses a certain threshold, it would be recorded in a table, which is dynamically expanded. The overhead for this method involves spending a few instructions to update the counter each time a block is executed. Each block averages several hundred instructions, and the counting might cost about 10+ instructions, leading to an overhead of approximately 3-5%.

I think optimizing frequently executed blocks could improve the performance of dynarec in some programs. (as long as the improvement exceeds the overhead of the counters)

Additionally, I am not familiar with the implementation of JumpTable. I would like to learn about the lookup latency, and is there any room for optimization?

wangguidong1999 avatar Jul 17 '24 07:07 wangguidong1999

Does the 3-5% also include atomic access because of the multi-thread nature of things?

Well, for now, there is no "optimized" settings in the dynarec, so even if you identify a "hot block", you can mark it as dirty for regeneration for example, but there is no secondary more optimised generator available for now...

About the JumpTable, yes, I have to write a blog entry about that, it's fairly complex in the concept, but quite optimized in execution. Not much room for improvment in it's current form (and it's something like the 3rd attempt to get fast link between blocks).

ptitSeb avatar Jul 17 '24 08:07 ptitSeb

3-5% is an estimated value, which does not take into account the influence of atomic access. I just consider single-thread apps currently.

Looking forward to your blog about JumpTable. I will appreciate it a lot.

Btw, do you have any optimization methods on your to-do list? I have a strong interest on contributing to this project.

wangguidong1999 avatar Jul 17 '24 09:07 wangguidong1999

Btw, do you have any optimization methods on your to-do list? I have a strong interest on contributing to this project.

Not for now. I do have plan to introduce disk-cached dynarec block, that would allow for AoT compilation for example, but it's on long-term TODO, nothing immediate...

ptitSeb avatar Jul 17 '24 10:07 ptitSeb

There is room for a custom peephole optimizer for each backend we have (without breaking the precise signal processing), but it's not an easy task.

ksco avatar Jul 17 '24 10:07 ksco

By the way, I am wondering if there might still be room for optimizing the translation of x86 flags in Box64?

wangguidong1999 avatar Sep 02 '24 08:09 wangguidong1999

@wangguidong1999 optimizations for handling x86 flags are always on-going.

I believe the original discussion has been addressed so I'll close this out. Feel free to open up a new GitHub Issue if you have any more questions.

LukeShortCloud avatar Oct 30 '25 09:10 LukeShortCloud