dynamorio
dynamorio copied to clipboard
Tests failing with "Encoding size 2 != instr size 1": drcacheoff.invariant_checker, drcachesim.opcode_mix
Happened on x86-64 here: https://github.com/DynamoRIO/dynamorio/actions/runs/6113660426/job/16593592056?pr=6302
390: CMake Error at /home/runner/work/dynamorio/dynamorio/suite/tests/runmulti.cmake:111 (message):
390: *** postcmd failed (Subprocess aborted): Encoding size 2 != instr size 1
390:
390: drcachesim:
390: /home/runner/work/dynamorio/dynamorio/clients/drcachesim/reader/reader.cpp:203:
390: virtual bool dynamorio::drmemtrace::reader_t::process_input_entry():
390: Assertion `false' failed.
390:
390: ***
390:
390: Call Stack (most recent call first):
390: /home/runner/work/dynamorio/dynamorio/suite/tests/runmulti.cmake:123 (process_cmdline)
390:
390:
389/471 Test #390: code_api|tool.drcacheoff.invariant_checker .......................***Failed 2.08 sec
Happened in the x86-32 run of tool.drcachesim.opcode_mix on the PR #6313: https://github.com/DynamoRIO/dynamorio/actions/runs/6227837689/job/16903332100?pr=6313
Actually I can reproduce this on my wkst on off.invar so I will try to figure it out.
52953 42999: 1958180 ifetch 6 byte(s) @ 0x00007fe4a353703d 48 3d 00 f0 ff ff cmp %rax, $0xfffff000
52954 43000: 1958180 ifetch 2 byte(s) @ 0x00007fe4a3537043 77 33 jnbe $0x00007fe4a3537078 (untaken)
52955 43000: 1958180 <marker: chunk footer #42>
Encoding size 2 != instr size 1
drcachesim: /usr/local/google/home/bruening/dr/git/src/clients/drcachesim/reader/reader.cpp:203: virtual bool dynamorio::drmemtrace::reader_t::process_input_entry(): Assertion `false' failed.
$ unzip -p suite/tests/tool.drcacheoff.invariant_checker.simple_app.1958180.1038.dir/trace/t*.zip chunk.0043 | od -A x -t x2 -w12 | awk '{printf "%s | %s %s %s%s%s%s\n", $1, $2, $3, $7, $6, $5, $4}' | head
000000 | 001c 0016 000000000000cedb
00000c | 001c 0002 002f644f0958421d
000018 | 001c 0003 0000000000000000
000024 | 002f 0001 00000000000000c3 encoding
000030 | 001c 001d 00007fe4a3521d46 branch target
00003c | 002f 0001 00000000000000c3 encoding repeated!
000048 | 0010 0001 00007fe4a3537045
The encoding is repeated across the indirect branch marker, and the encoding entries are combined with no instr in between. So the problem is in raw2trace inserted another encoding around the new marker type.
I reproduced in raw2trace_unit_tests: requires a chunk splitting a branch;ind-branch pair:
[drmemtrace]: Creating new chunk #2 for thread 1
[drmemtrace]: Chunk #3 ord marker 20
[drmemtrace]: Adding post-chunk-boundary encoding entry for decode=0x560ca99ed582 app=0x1a
[drmemtrace]: Appended encoding entry for 0x560ca99ed582 sz=1 0x000000c3...
[drmemtrace]: Chunk instr count is now 1
[drmemtrace]: Appended encoding entry for 0x560ca99ed576 sz=3 0x00c28b48...
[drmemtrace]: Appended instr fetch for original 0xe
<...>
35 type: 28 size: 21 val: 1
36 type: 28 size: 22 val: 20
37 type: 28 size: 2 val: 5
38 type: 28 size: 3 val: 0
39 type: 47 size: 1 val: 195
40 type: 28 size: 29 val: 14
41 type: 47 size: 1 val: 195
42 type: 16 size: 1 val: 26
After fix in unit test:
[drmemtrace]: Creating new chunk #2 for thread 1
[drmemtrace]: Chunk #3 ord marker 20
[drmemtrace]: Chunk instr count is now 1
[drmemtrace]: Appended encoding entry for 0x5620d9117576 sz=3 0x00c28b48...
[drmemtrace]: Appended instr fetch for original 0xe
<...>
35 type: 28 size: 21 val: 1
36 type: 28 size: 22 val: 20
37 type: 28 size: 2 val: 5
38 type: 28 size: 3 val: 0
39 type: 47 size: 1 val: 195
40 type: 28 size: 29 val: 14
41 type: 16 size: 1 val: 26
And the invariant test also works:
$ rm -rf suite/tests/tool.drcacheoff.invariant_checker.simple_app.1958180.1038.dir/trace; bin64/drrun -t drcachesim -chunk_instr_count 1000 -indir suite/tests/tool.drcacheoff.invariant_checker.simple_app.1958180.1038.dir -simulator_type view -sim_refs 14 -skip_instrs 42997
Output format:
<--record#-> <--instr#->: <---tid---> <record details>
------------------------------------------------------------
52944 42997: 1958180 <marker: timestamp 13339614527339825>
52945 42997: 1958180 <marker: tid 1958180 on core 11>
52946 42998: 1958180 ifetch 2 byte(s) @ 0x00007fe4a353703b 0f 05 syscall
52947 42998: 1958180 <marker: timestamp 13339614527373839>
52948 42998: 1958180 <marker: tid 1958180 on core 0>
52949 42998: 1958180 <marker: system call 257>
52950 42998: 1958180 <marker: maybe-blocking system call>
52951 42998: 1958180 <marker: timestamp 13339614527373853>
52952 42998: 1958180 <marker: tid 1958180 on core 0>
52953 42999: 1958180 ifetch 6 byte(s) @ 0x00007fe4a353703d 48 3d 00 f0 ff ff cmp %rax, $0xfffff000
52954 43000: 1958180 ifetch 2 byte(s) @ 0x00007fe4a3537043 77 33 jnbe $0x00007fe4a3537078 (untaken)
52955 43000: 1958180 <marker: chunk footer #42>
52956 43001: 1958180 ifetch 1 byte(s) @ 0x00007fe4a3537045 c3 ret
52957 43001: 1958180 read 8 byte(s) @ 0x00007ffc4cdf8638 by PC 0x00007fe4a3537045
This still happens in drcachesim.opcode_mix: https://github.com/DynamoRIO/dynamorio/actions/runs/6489025607/job/17622554375?pr=6357
307: Hello, world!
307: Encoding size 2 != instr size 6
307: drcachesim: /home/runner/work/dynamorio/dynamorio/clients/drcachesim/reader/reader.cpp:203: virtual bool dynamorio::drmemtrace::reader_t::process_input_entry(): Assertion `false' failed.
Happened again: https://github.com/DynamoRIO/dynamorio/actions/runs/8516627602/job/23325947839?pr=6749
309: Hello, world!
309: Encoding size 2 != instr size 6 for PC 0xf7b2157c at ord 253874 instr 176794 last_timestamp=0x4c653edf
309: drcachesim: /home/runner/work/dynamorio/dynamorio/clients/drcachesim/reader/reader.cpp:213: virtual bool dynamorio::drmemtrace::reader_t::process_input_entry(): Assertion `false' failed.
284/371 Test #309: code_api|tool.drcachesim.opcode_mix ..............................***Failed
Xref #3320 with weird online record reorderings over named pipes. This drcacesim.opcode_mix is an online run with -instr_encodings
turned on.
Failed one time on #6691.
drcachesim.opcode_mix continues to fail periodically, in post-merge here: https://github.com/DynamoRIO/dynamorio/actions/runs/8572128463/job/23493944852
drcachesim.opcode_mix failed here: https://github.com/DynamoRIO/dynamorio/actions/runs/8575475599/job/23504525032 https://github.com/DynamoRIO/dynamorio/actions/runs/8600139259/job/23564492380 Eg,
2024-04-05T20:45:19.2858331Z 310: Hello, world!
2024-04-05T20:45:19.2859316Z 310: Encoding size 2 != instr size 6 for PC 0xf72ff12e at ord 254549 instr 177282 last_timestamp=0xd12e7567
2024-04-05T20:45:27.1286674Z 285/771 Test #310: code_api|tool.drcachesim.opcode_mix .......................................................***Failed Required r\
egular expression not found. Regex=[^Hello, world!
2024-04-05T20:45:27.1288417Z ---- <application exited with code 0> ----
drcachesim.opcode_mix failed here: https://github.com/DynamoRIO/dynamorio/actions/runs/8626033117/job/23643564020 https://github.com/DynamoRIO/dynamorio/actions/runs/8652727426/job/23726294897
Happened after 524 runs:
$ ctest --repeat-until-fail 10000 -R drcachesim.opcode_mix
----------------------------------------------------------
Hello, world!
Encoding size 2 != instr size 6 for PC 0xf7273edb at ord 206078 instr 146557 last_timestamp=0x657be903
Candidates from simple_app offline near the very end with PC page offset 0xedb:
196961 140601: 1164739 ifetch 6 byte(s) @ 0xf71f1edb 0f 85 cb 00 00 00 jnz $0xf71f1fac (untaken)
203588 144926: 1164739 ifetch 6 byte(s) @ 0xf723dedb 8b 9f 0c 41 00 00 mov 0x0000410c(%edi), %ebx
Probably that last one:
203584 144924: 1164739 ifetch 8 byte(s) @ 0xf723decd f0 0f b1 97 ec 40 00 lock cmpxchg %edx, 0x000040ec(%edi), %eax
203584 144924: 1164739 00
203585 144924: 1164739 read 4 byte(s) @ 0xf7360f20 by PC 0xf723decd
203586 144924: 1164739 write 4 byte(s) @ 0xf7360f20 by PC 0xf723decd
203587 144925: 1164739 ifetch 6 byte(s) @ 0xf723ded5 0f 85 25 01 00 00 jnz $0xf723e000 (untaken)
203588 144926: 1164739 ifetch 6 byte(s) @ 0xf723dedb 8b 9f 0c 41 00 00 mov 0x0000410c(%edi), %ebx
203589 144926: 1164739 read 4 byte(s) @ 0xf7360f40 by PC 0xf723dedb
203590 144927: 1164739 ifetch 2 byte(s) @ 0xf723dee1 85 db test %ebx, %ebx
Adding a little more info and re-running: it happened after 1345 runs.
Hello, world!
Encoding size 2 != instr size 6 for PC 0xf72d4edb at ord 206078 instr 146557 last_timestamp=0x9294503e enc: 0 0 inst type: 10
32-bit fits only 4 encoding bytes per record: so the jnz would have a 2nd encoding record with 0 0. Did the 2nd jnz encoding record arrive out of order, after the ifetch record for the mov? Is that possible: pipe buffer atomicity doesn't match kernel atomicity, or pipe buffer splitting is somehow messed up?
Changing named_pipe_t::get_atomic_write_size() to return 512 makes this happen much more frequently: after 13, 14 runs:
Encoding size 2 != instr size 6 for PC 0xf72933d7 at ord 206075 instr 146557 last_timestamp=0xbd6cdb40 enc: ff ff inst type: 13
Encoding size 2 != instr size 6 for PC 0xf728f3d7 at ord 206075 instr 146557 last_timestamp=0xc5d5ea22 enc: ff ff inst type: 13
Aha: is_ok_to_split_before() thinks it's ok to split before an encoding record: it doesn't consider multiple consecutive encodings!
With the fix, the test passes 10,000 times in a row with the 512 pipe setting:
$ ctest --repeat-until-fail 10000 -R drcachesim.opcode_mix
...
Start 310: code_api|tool.drcachesim.opcode_mix
Test #310: code_api|tool.drcachesim.opcode_mix ... Passed 1.72 sec
Start 310: code_api|tool.drcachesim.opcode_mix
1/1 Test #310: code_api|tool.drcachesim.opcode_mix ... Passed 1.79 sec
100% tests passed, 0 tests failed out of 1
Total Test time (real) = 17341.30 sec