dynamorio
dynamorio copied to clipboard
Consider removing non-fetched instrs for repstr
Xref #2051 and discussion on PR #4912: https://github.com/DynamoRIO/dynamorio/pull/4912#discussion_r633888366
We should consider removing non-fetched instrs for repstr. Today, if the string loop is executed N times, we'll have the rep stringop instr N times in the instr stream. The last N-1 instances are marked as TRACE_TYPE_INSTR_NO_FETCH so that cache simulators can choose to ignore these duplicate entries.
The thinking today is that maybe core simulators also don't need these duplicate instrs, and we should just have only one single instr entry and then multiple memref entries based on loop count.
If we implemented #4948 and changed the underlying rep string expansion, the non-fetched instrs would disappear on their own and we'd just change the docs for the update.
Online traces incorrectly mark a continuing rep string loop as "fetched" for the first instr after a thread switch. Lumping here rather than a separate issue we want to fix b/c we plan to remove non-fetched. I hit this in PR #4974:
::14112:13616:: @76F68FA5 non-fetched instr x2
::14112.13616:: @76F68FA5 write 00EA9FBC x4
::14112:13616:: @76F68FA5 non-fetched instr x2
::14112.13616:: @76F68FA5 write 00EA9FC0 x4
::14112.14428:: marker type 2 value 3083420504
::14112.14428:: marker type 3 value 2
::14112.14428:: @76F62D1C instr x3
...
::14112.14428:: @754D3B02 write 00CFFE48 x4
::14112.13616:: marker type 2 value 3083420504
::14112.13616:: marker type 3 value 10
::14112.13616:: @76F68FA5 instr x2
Assertion failed: TESTALL(OFFLINE_FILE_TYPE_FILTERED, file_type_) || (prev_instr_[memref.data.tid].instr.addr + prev_instr_[memref.data.tid].instr.size == memref.instr.addr) || (prev_instr_[memref.data.tid].instr.addr == memref.instr.addr && memref.instr.type == TRACE_TYPE_INSTR_NO_FETCH) || (prev_xfer_marker_[memref.data.tid].instr.tid != 0 && (prev_xfer_marker_[memref.data.tid].instr.tid != memref.instr.tid || prev_xfer_marker_[memref.data.tid].marker.marker_type == TRACE_MARKER_TYPE_KERNEL_EVENT || prev_xfer_marker_[memref.data.tid].marker.marker_type == TRACE_MARKER_TYPE_KERNEL_XFER)) || prev_instr_[memref.data.tid].instr.type == TRACE_TYPE_INSTR_SYSENTER, file D:\derek\dr\git\src\clients\drcachesim\tests\trace_invariants.cpp, line 167
Trying to re-write the choices here:
A) On entry if TLS flag not set, record instr fetch and set flag; on final iter, clear flag. I assume have to try to clear flag on abnormal loop break too: a signal. So adds overhead and extra branches.
B) Keep entire loop inside block and unroll 1st iter to do fetch. Has all the downsides of a loop inside a block: messes up DR's trace building, signal delivery delay bounds, unlink flushing, shared fragment deletion promptness, etc.
C) Try to split across two blocks, one for 1st iter and 2nd for rest. Use PC+1 (skipping rep byte) for tag for 2nd and hope app never targets that PC. When see a plain string op have to decode backward I guess or have some recorded state so know it's this artificial loop body.
D) #4915: Remove all but 1st fetch in raw2trace, and add analysis to do the same when dumping a trace buffer to get the live instr counts to match too -- which also fixes online. But then instead of a single contiguous buffer being written/transmitted you have to do it in many pieces to skip the extra fetches, or waste time memcpying it on top of itself back to contiguous.
Whoops that last comment was meant for #4948 with this issue being choice D
We had an offline chat where we discussed that option (A) is doable. Note that ensuring a single instr entry gets trickier if the repstr execution is interrupted by a signal; this is because we'll need to save and restore the mentioned TLS flag on entry and exit from the signal handler, and there can be nested signals too. But we decided that it's actually better to emit a new instr entry when the repstr expansion resumes after such an interruption; this is because the repstr instruction would actually be re-fetched by the processor pipeline. So all we need to do on an signal is to clear the mentioned TLS flag.
Question: do we want the same behavior on interruption of the scatter/gather expansion by a fault? Note that the scatter/gather expansion cannot be interrupted by an async signal (DR would delay the async signal, as the expansion is a single fragment that's executed just once per dynamic scatter/gather instruction, unlike the repstr where the same fragment is executed N times), so this is only for a scatter/gather instr resuming after a fault. Today we add a single pc entry for the scatter/gather instruction, even if some store/load faults. Note that the scatter/gather expansion does not use a loop, unlike the repstr expansion. So, it is actually more complex to add a new pc entry on resumption after a fault. We'll need to have a TLS-flag-protected instr-entry emit before each memref-emit.
Not sure I understand the concern about scatter-gather: there is no problem today. We don't have to do anything extra for proper resumption of a rep-string or scatter-gather after a signal that is delivered to the app: that will always re-enter the original block from the start but now with a different resumption state (rcx for rep string; mask for scatter-gatther) because we translate the expansion to the original instruction.
that will always re-enter the original block from the start but now with a different resumption state
Oh, so after the fault, DR will not resume at the faulting cache pc, but at the faulting app pc (which will take it to the top of the scatter/gather fragment). Yes in that case we're fine, as the scatter/gather instr entry will indeed be emitted again.
The online failure to identify non-fetched after a thread switch lumped into this issue at https://github.com/DynamoRIO/dynamorio/issues/4915#issuecomment-870029848 can result in a PC discontinuity error:
1184144 760077: 9972 ifetch 3 byte(s) @ 0x76266f9f non-branch
1184145 760078: 9972 ifetch 2 byte(s) @ 0x76266fa2 non-branch
1184146 760078: 9972 write 4 byte(s) @ 0x009bfb08 by PC 0x76266fa2
1184147 760078: 9972 ifetch 2 byte(s) @ 0x76266fa2 non-fetched instruction
1184148 760078: 9972 write 4 byte(s) @ 0x009bfb0c by PC 0x76266fa2
1184149 760078: 9972 ifetch 2 byte(s) @ 0x76266fa2 non-fetched instruction
1184150 760078: 9972 write 4 byte(s) @ 0x009bfb10 by PC 0x76266fa2
2
------------------------------------------------------------
1184151 760079: 4424 ifetch 5 byte(s) @ 0x773beaf7 non-branch
<...>
1184206 760107: 4424 write 4 byte(s) @ 0x02c2f454 by PC 0x773beb0f
------------------------------------------------------------
1184207 760108: 9972 ifetch 2 byte(s) @ 0x76266fa2 non-branch
Trace invariant failure in T9972 at ref # 1184207 (69 instrs since timestamp 1660902515): Non-explicit control flow has no marker