panda icon indicating copy to clipboard operation
panda copied to clipboard

Fix memory callback coverage

Open lacraig2 opened this issue 3 years ago • 1 comments

What this PR does:

  • Makes the decision for memory callback a run-time decision instead of a translate time decision
  • Removes translate-time installation of qemu_ld_helpers_panda based on panda_use_memcb
  • Moves original helper_[endian]_[ld/st]_name functions into the inline static functions of the form helper_[endian]_[ld/st]_name_inner (and changes internal calls to this form as well)
  • Replaces helper_[endian]_[ld/st]_name functions with PANDA equivalent that checks panda_use_memcb at runtime
  • Keeps helper_[endian]_[ld/st]_name_panda functions for compatibility.
  • Adds panda memory callbacks to fast case of cpu_ldst_template.h

The issue:

See #1119 for context.

A bug was reported that indicated that the x86 instruction cmpxchg (32 or 64-bit versions) did not generate memory callbacks. The examination in the PR is somewhat different from my analysis. First, the relevant functions are not:

void helper_cmpxchg8b(CPUX86State *env, target_ulong a0)
void helper_cmpxchg16b(CPUX86State *env, target_ulong a0)

because when generating for single core we generate _unlocked versions. https://github.com/panda-re/panda/blob/0c15599d7775cf7978959f0901b998add06171a1/target/i386/translate.c#L5269-L5273

The unlocked versions use cpu_ldst functions like @AndrewFasano mentioned. https://github.com/panda-re/panda/blob/0c15599d7775cf7978959f0901b998add06171a1/target/i386/mem_helper.c#L28-L52

It seems to me that there are two problems:

  • The first is that functions of the form helper_[endian]_st[size]_mmu_panda are only replaced here https://github.com/panda-re/panda/blob/d2ee3d1bde15623813bcae1708eaa7d759cedd96/tcg/i386/tcg-target.inc.c#L1223-L1242 which leaves it open to be called by other APIs, which misses our callback infrastructure. Such is the case for the cpu_ldst framework. https://github.com/panda-re/panda/blob/dev/include/exec/cpu_ldst_template.h#L102-L103
  • The second is that within the cpu_ldst framework we do not cover the case where ld[size]_p is called directly on translated memory. https://github.com/panda-re/panda/blob/dev/include/exec/cpu_ldst_template.h#L105-L106

Advantages

  • This PR allows for memory callbacks to be truly dynamically enabled/disabled. Prior to this the decision was made at translate-time. If you wanted to truly disable mem callbacks you'd need to clear the TB cache. This PR changes the decision to runtime.
  • This PR allows PANDA to cover a wider range of data accesses on x86 that were otherwise missed. This includes the cpu_ldst functions, but also includes direct calls from other instruction such as:
    • FLDT: https://github.com/panda-re/panda/blob/6ed3f19cd0399d452b1e5744c2ed9ca672fe2b7c/target/i386/fpu_helper.c#L72-L80
    • BNDLDX: https://github.com/panda-re/panda/blob/6ed3f19cd0399d452b1e5744c2ed9ca672fe2b7c/target/i386/mpx_helper.c#L138-L148
    • BNDSTX: https://github.com/panda-re/panda/blob/6ed3f19cd0399d452b1e5744c2ed9ca672fe2b7c/target/i386/mpx_helper.c#L105-L120
  • Additionally, there are likely benefits to other architectures. There are several examples of other architectures using this style. This one is from PPC:
    • https://github.com/panda-re/panda/blob/6ed3f19cd0399d452b1e5744c2ed9ca672fe2b7c/target/ppc/mem_helper.c#L56-L66

Potential issues

  • This PR moves the decision to use or not use a memory callback to runtime. That is at least one additional resolution of panda_use_memcb for every memory access. The function makes the case for not using memory callbacks as fast as possible. It inlines the internal version, but it's still a change that could impact performance. https://github.com/panda-re/panda/blob/6ed3f19cd0399d452b1e5744c2ed9ca672fe2b7c/softmmu_template.h#L536-L543

Design decisions

Instead of replacing helper_[endian]_[ld/st]_name with their PANDA equivalents and moving the old to the _internal versions I could have instead changed the cpu_[ld/st][size]_data_ret functions to use the _panda versions based on the status of panda_use_memcb. I chose not to because I think covering the default case of helper_[endian]_[ld/st]_name with the PANDA version covers far more cases we may have not considered. I also wanted the ability to dynamically enable/disable memory callbacks. If it turns out that that is hugely detrimental to performance I'm happy to consider other options.

Coverage decisions

This PR changes the PANDA model to cover virtual memory read/writes from all system mmu modes. This means every time data is written/read in a helper it will be recorded. Examples include #1179 and #1119. However, it also includes things not previously considered part of the model such as reads made by the FPU, register store/restore, segment reads, and SIMD.

Performance implications

  • Does this PR make a difference compare to two systems with memcb disabled? The performance difference here is almost indistinguishable.

  • Does this PR make a difference compare to two systems with memcb enabled? The relatively minimal performance impact here seems to result from additional reads/writes being hit.

  • In a relatively short recording on x86 there were about 1500 more calls out of around 7.3 million total calls.

  • In an ARM recording there were exactly the same number of calls. ARM doesn't seem to use these calls.

CLOSES: #1119

lacraig2 avatar Jan 11 '22 18:01 lacraig2

Although I am not deep enough into QEMU internals to give a thorough code review, I can confirm that your PR fixes my cmpxchg16b use case (many thanks!).

pcworld avatar Feb 08 '22 11:02 pcworld