unicorn Optimize memory handling

By using the memory snapshots with real examples we found some bottlenecks. Therefor I have implemented some optimizations.

We found that the flatview creation is quite expensive (around 1/4 of all runtime including emulation) when you have a lot of memory regions. By permanently creating new regions with CoW we have a lot of regions and rebuild for each cow the flatview again. I have added a update function does only change affected ranges.

Also we found that the find_ram_offset_last optimization doesn't work after a restore. To fix this the context now includes also the ramblock_freed bool.

This also includes two smaller optimizations: only clear the affected tlb on memory_cow and save the last ramblock of the ram_list

It's currently only a draft, because we need to do a few more tests to see if the optimization works as expected.

Jun 06 '24 14:06 PhilippTakacs

I have additional added the flatview to the context. This way flatview doesn't need to be rebuild on context_restore.

I have tested and profiled this. It does in fact save us relevant time.

Jul 03 '24 09:07 PhilippTakacs

@wtdcode have you overseen this PR? It would be nice to get some feedback.

Jul 19 '24 12:07 PhilippTakacs

@wtdcode have you overseen this PR? It would be nice to get some feedback.

Sorry for not leaving a comment. I had a quick review on it and didn't find any obvious problems. I planned to have further debugging to have a better understanding before commenting further.

Jul 19 '24 16:07 wtdcode

I ported d01035767e47f69b4e545d1843dbaf08e6a74752 which generally allows unicorn to go fast qemu_st path (not going through store_helper again) to have better writing performance. This shall have side effects on your memory_cow which relies on store_helper.

I have twinkled a bit on the code to ensure that when snapshot is enabled, no TLB is marked clean. However, I'm not sure if that works for you, you may have a look.

Regarding this PR, I locally tested and think it is good to go if you don't have more to add.

Sep 21 '24 14:09 wtdcode

Thanks for the update. As far as I understand this patch allows faster writes, if the tlb cache entry exists. So for snapshots it might be enough to clear the tlb cache. This way the store_helper would be called again and everything works as expected. I'll test this.

think it is good to go if you don't have more to add

I would like to rebase this to the dev branch to remove the "fixup" commits. Also because I haven't looked at this for a few weeks, I would like to test this again with the current HEAD. I'll ping you when I'm finished testing.

Oct 01 '24 09:10 PhilippTakacs

@wtdcode I have tested the code again and removed the fixup commits so for me this is ready to merge.

For the notdirty write I'm working on a optimization, but I'll open another PR for it.

Oct 11 '24 13:10 PhilippTakacs

@wtdcode I have tested the code again and removed the fixup commits so for me this is ready to merge.

For the notdirty write I'm working on a optimization, but I'll open another PR for it.

Will check today or tomorrow, thanks for your work!

Oct 11 '24 13:10 wtdcode

Okay, I finished the review, and it's safe to go now with minor fixes and a few document updates. Thanks again for your brilliant work.

Oct 16 '24 10:10 wtdcode

I have added the last fixes and rebased it to the dev branch.

The failed test doesn't look like it's a bug by my changes.

Oct 16 '24 12:10 PhilippTakacs

I have added the last fixes and rebased it to the dev branch.

The failed test doesn't look like it's a bug by my changes.

I will rerun and it seems Github Action bug.

Oct 16 '24 12:10 wtdcode

Thanks!

Oct 16 '24 13:10 wtdcode

unicorn unicorn copied to clipboard

Optimize memory handling

unicorn
unicorn copied to clipboard