panda
panda copied to clipboard
Fix memory callback coverage
What this PR does:
- Makes the decision for memory callback a run-time decision instead of a translate time decision
- Removes translate-time installation of
qemu_ld_helpers_panda
based onpanda_use_memcb
- Moves original
helper_[endian]_[ld/st]_name
functions into the inline static functions of the formhelper_[endian]_[ld/st]_name_inner
(and changes internal calls to this form as well) - Replaces
helper_[endian]_[ld/st]_name
functions with PANDA equivalent that checkspanda_use_memcb
at runtime - Keeps
helper_[endian]_[ld/st]_name_panda
functions for compatibility. - Adds panda memory callbacks to fast case of cpu_ldst_template.h
The issue:
See #1119 for context.
A bug was reported that indicated that the x86 instruction cmpxchg
(32 or 64-bit versions) did not generate memory callbacks. The examination in the PR is somewhat different from my analysis. First, the relevant functions are not:
void helper_cmpxchg8b(CPUX86State *env, target_ulong a0)
void helper_cmpxchg16b(CPUX86State *env, target_ulong a0)
because when generating for single core we generate _unlocked
versions. https://github.com/panda-re/panda/blob/0c15599d7775cf7978959f0901b998add06171a1/target/i386/translate.c#L5269-L5273
The unlocked versions use cpu_ldst
functions like @AndrewFasano mentioned.
https://github.com/panda-re/panda/blob/0c15599d7775cf7978959f0901b998add06171a1/target/i386/mem_helper.c#L28-L52
It seems to me that there are two problems:
- The first is that functions of the form
helper_[endian]_st[size]_mmu_panda
are only replaced here https://github.com/panda-re/panda/blob/d2ee3d1bde15623813bcae1708eaa7d759cedd96/tcg/i386/tcg-target.inc.c#L1223-L1242 which leaves it open to be called by other APIs, which misses our callback infrastructure. Such is the case for the cpu_ldst framework. https://github.com/panda-re/panda/blob/dev/include/exec/cpu_ldst_template.h#L102-L103 - The second is that within the cpu_ldst framework we do not cover the case where
ld[size]_p
is called directly on translated memory. https://github.com/panda-re/panda/blob/dev/include/exec/cpu_ldst_template.h#L105-L106
Advantages
- This PR allows for memory callbacks to be truly dynamically enabled/disabled. Prior to this the decision was made at translate-time. If you wanted to truly disable mem callbacks you'd need to clear the TB cache. This PR changes the decision to runtime.
- This PR allows PANDA to cover a wider range of data accesses on x86 that were otherwise missed. This includes the
cpu_ldst
functions, but also includes direct calls from other instruction such as:- FLDT: https://github.com/panda-re/panda/blob/6ed3f19cd0399d452b1e5744c2ed9ca672fe2b7c/target/i386/fpu_helper.c#L72-L80
- BNDLDX: https://github.com/panda-re/panda/blob/6ed3f19cd0399d452b1e5744c2ed9ca672fe2b7c/target/i386/mpx_helper.c#L138-L148
- BNDSTX: https://github.com/panda-re/panda/blob/6ed3f19cd0399d452b1e5744c2ed9ca672fe2b7c/target/i386/mpx_helper.c#L105-L120
- Additionally, there are likely benefits to other architectures. There are several examples of other architectures using this style. This one is from PPC:
- https://github.com/panda-re/panda/blob/6ed3f19cd0399d452b1e5744c2ed9ca672fe2b7c/target/ppc/mem_helper.c#L56-L66
Potential issues
- This PR moves the decision to use or not use a memory callback to runtime. That is at least one additional resolution of
panda_use_memcb
for every memory access. The function makes the case for not using memory callbacks as fast as possible. It inlines the internal version, but it's still a change that could impact performance. https://github.com/panda-re/panda/blob/6ed3f19cd0399d452b1e5744c2ed9ca672fe2b7c/softmmu_template.h#L536-L543
Design decisions
Instead of replacing helper_[endian]_[ld/st]_name
with their PANDA equivalents and moving the old to the _internal
versions I could have instead changed the cpu_[ld/st][size]_data_ret
functions to use the _panda
versions based on the status of panda_use_memcb
. I chose not to because I think covering the default case of helper_[endian]_[ld/st]_name
with the PANDA version covers far more cases we may have not considered. I also wanted the ability to dynamically enable/disable memory callbacks. If it turns out that that is hugely detrimental to performance I'm happy to consider other options.
Coverage decisions
This PR changes the PANDA model to cover virtual memory read/writes from all system mmu modes. This means every time data is written/read in a helper it will be recorded. Examples include #1179 and #1119. However, it also includes things not previously considered part of the model such as reads made by the FPU, register store/restore, segment reads, and SIMD.
Performance implications
-
Does this PR make a difference compare to two systems with
memcb
disabled? The performance difference here is almost indistinguishable. -
Does this PR make a difference compare to two systems with
memcb
enabled? The relatively minimal performance impact here seems to result from additional reads/writes being hit. -
In a relatively short recording on x86 there were about 1500 more calls out of around 7.3 million total calls.
-
In an ARM recording there were exactly the same number of calls. ARM doesn't seem to use these calls.
CLOSES: #1119
Although I am not deep enough into QEMU internals to give a thorough code review, I can confirm that your PR fixes my cmpxchg16b
use case (many thanks!).