dynamorio seeing more than one cpu id on one output stream with MAP_TO_RECORDED

Describe the bug

See https://groups.google.com/g/DynamoRIO-Users/c/Dsf4As64r7o

In some of my tests, I am seeing more than one cpu id on one output stream with MAP_TO_RECORDED_OUTPUT.

The value returned by stream->get_output_cpuid matches the initial CPU_ID marker values, but at some point on some streams, we see other CPU_ID values as shown below.

Example:

Output #5 processing CPU #7                   (from get_stream_cpuid)
Output #5 saw CPU_ID marker: 7
Output #5 saw CPU_ID marker: 7
Output #5 saw CPU_ID marker: 7
...
Output #5 saw CPU_ID marker: 7
Output #5 saw CPU_ID marker: 7
Output #5 saw CPU_ID marker: 5

To Reproduce Steps to reproduce the behavior:

Collect a trace with drrun -t drcachesim -offline
Use the scheduler infrastructure with the MAP_TO_RECORDED_OUTPUT option
We see multiple CPU IDs on the same output stream which is not expected

Please also answer these questions:

What happens when you run without any client?
- Not applicable
What happens when you run with debug build ("-debug" flag to drrun/drconfig/drinject)?
- No difference

Expected behavior With the MAP_TO_RECORDED_OUTPUT option, each output stream maps to exactly one CPU

Screenshots or Pasted Text

Versions

What version of DynamoRIO are you using?
- Commit 910ccb25a66175bcba3f349cb7a43c8da7d9f7c3
Does the latest build from https://github.com/DynamoRIO/dynamorio/releases solve the problem?
- No
What operating system version are you running on? ("Windows 10" is not sufficient: give the release number.)
- Ubuntu 18
Is your application 32-bit or 64-bit?
- 64-bit

Additional context

Oct 10 '23 06:10 prasun3

It looks like this is the culprit, this thread with zero instructions:

$ bin64/drrun -t drcachesim -simulator_type view -only_thread 85300 -indir ../src/clients/drcachesim/tests/drmemtrace.schedtest.x64
Output format:
<--record#-> <--instr#->: <---tid---> <record details>
------------------------------------------------------------
           1           0:       85300 <marker: version 6>
           2           0:       85300 <marker: filetype 0xe40>
           3           0:       85300 <marker: cache line size 64>
           4           0:       85300 <marker: chunk instruction count 10000000>
           5           0:       85300 <marker: page size 4096>
           6           0:       85300 <marker: timestamp 13341391202394706>
           7           0:       85300 <marker: tid 85300 on core 1>
           8           0:       85300 <marker: function==syscall #202>
           9           0:       85300 <marker: function return value 0x0>
          10           0:       85300 <marker: timestamp 13341391214912820>
          11           0:       85300 <marker: tid 85300 on core 0>
          12           0:       85300 <thread 85300 exited>
View tool results:
              0 : total instructions

All the scheduling is instruction-count-based so this degenerate thread has 2 entries both at instruction count 0:

$ unzip -p ../src/clients/drcachesim/tests/drmemtrace.schedtest.x64/cpu_schedule.bin.zip 0 | od -t d8 -w32
0000000                85303    13341391204829725                    0                    0
0000040                85304    13341391205074867                    0                    0
0000100                85300    13341391214912820                    0                    0
0000140                85304    13341391214943240                    0               109071
0000200

$ unzip -p ../src/clients/drcachesim/tests/drmemtrace.schedtest.x64/cpu_schedule.bin.zip 1 | od -t d8 -w32
0000000                85300    13341391202394706                    1                    0
0000040                85303    13341391206798536                    1                29985
0000100

The first question is, how did this thread end up having no instructions and still being output: I thought we had a filter to remove empty threads. That filter is based bytes written: so maybe the futex syscall exit (must have attached mid-futex as there's no entry) is what made it past the filter?

Oct 10 '23 20:10 derekbruening

Thanks for the debug! I added a comment here: https://github.com/DynamoRIO/dynamorio/pull/6356#issuecomment-1757043495

Oct 11 '23 07:10 prasun3

The second case from https://github.com/DynamoRIO/dynamorio/pull/6355#issuecomment-1759306715 is rep string:

Input    1 @ 13109305 refs,   3329113 instrs
ERROR: CPU marker 0 on core #1 differs from output stream CPU ID 1

$ for i in 0 1 12; do echo "Core #$i"; unzip -p ../src/clients/drcachesim/tests/drmemtrace.schedtest.x64/cpu_schedule.bin.zip $i | od -t d8 -w32; done
Core #0
0000000               180525    13341574296169946                    0                    0
0000040
Core #1
0000000               180525    13341574300611346                    1              3329113
0000040
Core #12
0000000               180524    13341574296003803                   12                    0
0000040

$ bin64/drrun -t drcachesim -simulator_type view -only_thread 180525 -indir ../src/clients/drcachesim/tests/drmemtrace.schedtest.x64 -skip_instrs 3329110 -sim_refs 30
Output format:
<--record#-> <--instr#->: <---tid---> <record details>
------------------------------------------------------------
    13109282     3329110:      180525 <marker: timestamp 13341574300611269>
    13109283     3329110:      180525 <marker: tid 180525 on core 0>
    13109284     3329111:      180525 ifetch       3 byte(s) @ 0x00007f25505e5f20 48 89 c7             mov    %rax -> %rdi
    13109285     3329112:      180525 ifetch       3 byte(s) @ 0x00007f25505e5f23 48 89 d6             mov    %rdx -> %rsi
    13109286     3329113:      180525 ifetch       3 byte(s) @ 0x00007f25505e5f26 f3 48 a5             rep movs %ds:(%rsi)[8byte] %rsi %rdi %rcx -> %es:(%rdi)[8byte] %rsi %rdi %rcx
    13109287     3329113:      180525 read         8 byte(s) @ 0x00007f25519a6ae0 by PC 0x00007f25505e5f26
    13109288     3329113:      180525 write        8 byte(s) @ 0x00007f25519a6ce0 by PC 0x00007f25505e5f26
    13109289     3329113:      180525 ifetch       3 byte(s) @ 0x00007f25505e5f26 f3 48 a5             rep movs %ds:(%rsi)[8byte] %rsi %rdi %rcx -> %es:(%rdi)[8byte] %rsi %rdi %rcx
    13109290     3329113:      180525 read         8 byte(s) @ 0x00007f25519a6ae8 by PC 0x00007f25505e5f26
    13109291     3329113:      180525 write        8 byte(s) @ 0x00007f25519a6ce8 by PC 0x00007f25505e5f26
    13109292     3329113:      180525 ifetch       3 byte(s) @ 0x00007f25505e5f26 f3 48 a5             rep movs %ds:(%rsi)[8byte] %rsi %rdi %rcx -> %es:(%rdi)[8byte] %rsi %rdi %rcx
    13109293     3329113:      180525 read         8 byte(s) @ 0x00007f25519a6af0 by PC 0x00007f25505e5f26
    13109294     3329113:      180525 write        8 byte(s) @ 0x00007f25519a6cf0 by PC 0x00007f25505e5f26
    13109295     3329113:      180525 ifetch       3 byte(s) @ 0x00007f25505e5f26 f3 48 a5             rep movs %ds:(%rsi)[8byte] %rsi %rdi %rcx -> %es:(%rdi)[8byte] %rsi %rdi %rcx
    13109296     3329113:      180525 read         8 byte(s) @ 0x00007f25519a6af8 by PC 0x00007f25505e5f26
    13109297     3329113:      180525 write        8 byte(s) @ 0x00007f25519a6cf8 by PC 0x00007f25505e5f26
    13109298     3329113:      180525 ifetch       3 byte(s) @ 0x00007f25505e5f26 f3 48 a5             rep movs %ds:(%rsi)[8byte] %rsi %rdi %rcx -> %es:(%rdi)[8byte] %rsi %rdi %rcx
    13109299     3329113:      180525 read         8 byte(s) @ 0x00007f25519a6b00 by PC 0x00007f25505e5f26
    13109300     3329113:      180525 write        8 byte(s) @ 0x00007f25519a6d00 by PC 0x00007f25505e5f26
    13109301     3329113:      180525 ifetch       3 byte(s) @ 0x00007f25505e5f26 f3 48 a5             rep movs %ds:(%rsi)[8byte] %rsi %rdi %rcx -> %es:(%rdi)[8byte] %rsi %rdi %rcx
    13109302     3329113:      180525 read         8 byte(s) @ 0x00007f25519a6b08 by PC 0x00007f25505e5f26
    13109303     3329113:      180525 write        8 byte(s) @ 0x00007f25519a6d08 by PC 0x00007f25505e5f26
    13109304     3329113:      180525 <marker: timestamp 13341574300611280>
    13109305     3329113:      180525 <marker: tid 180525 on core 0>
    13109306     3329113:      180525 <marker: timestamp 13341574300611346>
    13109307     3329113:      180525 <marker: tid 180525 on core 1>
    13109308     3329113:      180525 ifetch       3 byte(s) @ 0x00007f25505e5f26 f3 48 a5             rep movs %ds:(%rsi)[8byte] %rsi %rdi %rcx -> %es:(%rdi)[8byte] %rsi %rdi %rcx
    13109309     3329113:      180525 read         8 byte(s) @ 0x00007f25519a6b10 by PC 0x00007f25505e5f26
    13109310     3329113:      180525 write        8 byte(s) @ 0x00007f25519a6d10 by PC 0x00007f25505e5f26
    13109311     3329113:      180525 ifetch       3 byte(s) @ 0x00007f25505e5f26 f3 48 a5             rep movs %ds:(%rsi)[8byte] %rsi %rdi %rcx -> %es:(%rdi)[8byte] %rsi %rdi %rcx

All the rep strings have the same instr count: making it not work well as a divider point! Will have to consider how to handle this.