Add instruction encoding entries to drmemtrace
Today, the drmemtrace format does not include instruction encodings or even opcodes beyond identifying branch types. This is sufficient for functional cache simulation, as in drcachesim, but for core simulation the operand dependencies and ideally the opcodes are needed. The solution today is to preserve the application binaries for the executable and libraries, with support for mapping them in during trace analysis in order to decode the instructions (using the same support built for post-processing offline traces). Adding the encodings into the trace proper would make the trace self-contained. It is also the simplest solution for supporting dynamically-generated code that has no binary to query later: https://github.com/DynamoRIO/dynamorio/issues/2062#issuecomment-985707479
Noting two use cases which could be handled differently:
- DGC code where there is no binary: in this case the opcodes need to be traced at runtime
- Using traces collected on Windows on Linux in which case the Windows binaries cannot be parsed on Linux: in this case the opcodes may be inserted during raw2trace conversion
Including the instruction encodings would increase trace files sizes by an estimated 75%. One option to reduce that is to only include the encoding once for the first dynamic instance of a static instruction (applying only to unchanging library code, or possibly to jitted code through analysis). This could complicate skipping part of the trace as in the proposed feature #5538 but we already have precedent for once-only information in physical address markers (#4014) and could solve by re-emitting every so often.
There are actually a couple of unfinished things:
- Add a way for tools to know when to invalidate cached decodings
- Add encodings to online traces, but under an option since the extra trace_entry_t records affect overhead there
- Add encoding entry count to basic_counts
- Update https://github.com/DynamoRIO/drmemtrace_samples
- Add AArch32 support (Arm vs Thumb LSB pieces are missing)
Pretty much everything is done except the initial online encoding solution is imperfect:
// TODO i#5520: Currently we emit the encoding again for every dynamic instance
// of an instruction. We should record which we've emitted and avoid duplicate
// instances (the reader caches prior encodings). For offline we do this
// separately per thread, which makes knowing when code has changed complex as
// any per-thread structures would need a global walk across them on a fragment
// deletion event. For online, however, we may be able to emit just once globally
// if the reader is always interleaving all threads: but while that simplifies
// invalidation/code changes it requires global locks to update the structure.
// Since encodings are off by default we leave it as emitting every time
// with corresponding extra overhead for now.