ARM suite test regressions: api.disA32, api.disT32, many more
We did not have ARM tests being run for a while. Now that we have a CDash machine back up, I compared its debug run yesterday to the most recent on that same machine name from April 2018:
http://dynamorio.org/CDash/viewTest.php?buildid=35255
Testing started on 2018-04-18 23:37:13
Name Status Time Details Labels
code_api|client.drmgr-test Failed 5s 810ms Completed (Failed)
code_api|linux.signal_race Failed 1m 30s 690ms Completed (Failed)
code_api|linux.eintr-noinline Failed 1m 30s 220ms Completed (Failed)
code_api|tool.drcachesim.invariants Failed 1m 30s 410ms Completed (Failed)
code_api|tool.histogram.offline Failed 1m 31s 760ms Completed (Failed)
code_api|tool.drcachesim.delay-simple Failed 8s 340ms Completed (Failed)
http://dynamorio.org/CDash/viewTest.php?buildid=44194
Testing started on 2019-06-21 23:30:26
Name Status Time Details Labels
code_api|linux.signal_race Failed 1m 30s 20ms Completed (Failed)
code_api|linux.eintr-noinline Failed 1m 30s 180ms Completed (Failed)
 code_api|linux.sigsuspend Failed 1m 30s 100ms Completed (Failed)
 code_api|tool.drcacheoff.simple Failed 6s 770ms Completed (Failed)
 code_api|tool.drcacheoff.filter Failed 8s 820ms Completed (Failed)
 code_api|tool.drcacheoff.opcode_mix Failed 7s 620ms Completed (Failed)
 code_api|linux.syscall_pwait Failed 10s 890ms Completed (Failed)
 code_api|client.predicate-test Failed 1s 110ms Completed (Failed)
 code_api|client.drreg-test Failed 1s 520ms Completed (Failed)
code_api|client.drmgr-test Failed 3s 410ms Completed (Failed)
 code_api|client.drutil-test Failed 2s 420ms Completed (Failed)
 code_api|sample.opcodes Failed 510ms Completed (Failed)
code_api|tool.drcachesim.delay-simple Failed 4s 110ms Completed (Failed)
 code_api|tool.drcacheoff.multiproc Failed 12s 390ms Completed (Failed)
 code_api|api.disA32 Failed 1s 360ms Completed (Failed)
 code_api|api.disT32 Failed 1s 430ms Completed (Failed)
I'm assuming these regressions have happened because we don't have AArch32 precommit testing in the same way we have for AArch64?
Note: in a recent debug run code_api|linux.signal_race did not fail, so it is at least flaky.
In a recent release run we observe code_api|linux.reset failing as well. In addition, code_api|linux.sigsuspend and code_api|linux.signal_race passed in that run.
I'm assuming these regressions have happened because we don't have AArch32 precommit testing in the same way we have for AArch64?
That's my assumption, but I wouldn't have expected the ARM encoder/decoder tests to have broken. Hopefully it's something simple.
The log message for c76c48562f6781fd38b34b7e367c9221a5bd6f4d (Oct 2021) says:
Adds missing required-1 bits in the ARM encoding table entries for
OP_blx, OP_bx, and OP_bxj. Without the bits, some hardware still
accepts the instructions (which is why we did not notice the problem
before), but they are technically unsound, and QEMU thinks they are
invalid, breaking some of our tests under QEMU.
That change is responsible for most of the api.disA32 failures, and perhaps that change should be reversed. It's my understanding that all hardware will accept those instructions: bits 8-19, which should be 1, will be ignored. Since these are branch instructions DynamoRIO really needs to decode and intercept them, I think. The problem with QEMU is perhaps a bug in QEMU. Perhaps it has even been fixed in the meantime.
The other api.dis failures are due to vadd.f32 and I have a patch for them coming soon, I hope.
I did a bisection on common.segfault: it seems to have been broken by a8c4f6fbcb847f7dec80c4f86e3b7de33f332b27, and I can make it pass again on the head by changing RETCODE_SIZE in core/unix/signal_private.h from 16 to 8. However, I've seen unsigned long retcode[4] in a genuine kernel header file so maybe there is something else wrong that is somehow compensated for by the previous definition of RETCODE_SIZE.
Yes, I did see unsigned long retcode[4] in a genuine kernel header file, but it was part of sigframe, not part of rt_sigframe. There seems to be no retcode in an AArch32 rt_sigframe. I will draft a PR to remove it.
(A spurious retcode of length 8 bytes happens to be harmless because there are a couple of items on the stack beyond the rt_sigframe so copying 8 bytes from after the rt_sigframe does not segfault. Copying any more does segfault, which is particularly confusing in this case as we're already in the process of attempting to handle a segfault.)
In an arm32v7 Debian 10 Docker container under a 64-bit 5.4.47 kernel on Cortex-A72 hardware with DynamoRIO slightly modified (retcode removed from rt_sigframe) I got 114 out of 298 tests failing. One of the failing tests was drcacheoff.simple so I looked at that one under GDB. The stack trace looked like nonsense but the SIGBUS reported from an instance of ldrexd r0, r1, [r2] looked plausible: r2 contained an odd multiple of 4 but that instruction requires 8-byte alignment.
Perhaps it's obvious in hindsight was was happening but I had to do stuff like insert home-made asm volatile("b.w .\n\t.inst 65") breakpoints into the source to discover that the problem seemed to be one of the two instances (probably the first one) in tracer.cpp of
placement = dr_global_alloc(MAX_INSTRU_SIZE);
instru = new (placement) offline_instru_t(
dr_global_alloc returns a value with HEAP_ALIGNMENT, which is just 4 on ARM, but offline_instru_t uses some library lock function that assumes the standard 8-byte alignment and uses ldrexd.
So I changed HEAP_ALIGNMENT to 8, added a couple of ALIGN_FORWARDs in the definition of BLOCK_SIZES, also changed HEADER_SIZE to 8 and replaced STANDARD_HEAP_ALIGNMENT - HEAP_ALIGNMENT with 4 in redirect_malloc and commented out a few ASSERTs, and then I got only 72 out of 298 tests failing, so 42 tests were fixed by that hackery.
(In order to build the tests I also had to add a bogus definition of gettid to burst_syscall_inject.cpp.)
Question for @derekbruening: Is changing HEAP_ALIGNMENT to 8 on ARM the right approach here?
(Since DynamoRIO has been broken on 32-bit ARM hardware since 2019 there's perhaps not a huge amount of interest in it so we probably can't invest a huge amount of effort into it so an easy fix would be preferred!)
Question for @derekbruening: Is changing HEAP_ALIGNMENT to 8 on ARM the right approach here?
Maybe that is the simplest rather than having to find all C++ uses and replacing with __wrap_malloc(). Hopefully the DR allocator code just works if the alignment define is changed...
With 73b1df12b3484032045e07923668a7c834511469 plus a few changes that have not yet been merged, namely:
- Changing
HEAP_ALIGNMENTto 8 (see above) - Disabling
DEBUG_ASSERT(#7132) - Fixing
drreg.c(I have a draft PR)
in an AArch32 Debian 10 container under a 64-bit 5.4.47 Linux kernel running on Cortex-A72 I got 47 tests failed out of 298:
4 - tool.drcacheoff.raw2trace_unit_tests (Failed)
5 - tool.scheduler.unit_tests (Child aborted)
8 - tool.drcacheoff.trace_interval_analysis_unit_tests (Failed)
9 - tool.drcacheoff.analysis_unit_tests (Timeout)
20 - code_api|linux.sigaction.native (Failed)
85 - code_api|linux.sigmask (Failed)
89 - code_api|linux.fib-conflict (Failed)
90 - code_api|linux.fib-conflict-early (Failed)
93 - code_api|linux.vfork (Failed)
94 - code_api|linux.thread-reset (Failed)
100 - code_api|pthreads.pthreads_fork_FLAKY (Failed)
116 - code_api|client.detach_test (Timeout)
120 - code_api|client.file_io (Failed)
129 - code_api|client.drmgr-test (Failed)
136 - code_api|client.drbbdup-drwrap-test (Failed)
140 - code_api|client.drreg-test (Failed)
141 - code_api|client.drreg-end-restore (Failed)
154 - code_api|client.drutil-test (Failed)
172 - code_api|sample.statecmp (Failed)
194 - code_api|tool.drcachesim.delay-global (Failed)
213 - code_api|tool.drcacheoff.raw-zlib (Failed)
217 - code_api|tool.drcacheoff.multiproc (Timeout)
225 - code_api|tool.drcacheoff.sysnums (Failed)
229 - code_api|tool.drcacheoff.max-global (Failed)
235 - code_api|tool.drcacheoff.windows-invar (Failed)
253 - code_api|tool.drcacheoff.invariant_checker (Failed)
254 - code_api|tool.drcacheoff.invariant_checker_pthreads (Failed)
255 - code_api|tool.histogram.offline (Failed)
256 - code_api|tool.record_filter (Failed)
257 - code_api|tool.record_filter_bycore_uni (Failed)
258 - code_api|tool.record_filter_bycore_multi (Failed)
259 - code_api|tool.drcachesim.drstatecmp-delay-simple (Failed)
260 - code_api|tool.drcacheoff.drstatecmp-delay-simple (Failed)
263 - code_api|tool.drcacheoff.getretaddr_record_replace_retaddr (Failed)
272 - code_api|client.drwrap-test-detach (Failed)
274 - code_api|api.disA32 (Failed)
276 - code_api|api.startstop (Failed)
277 - code_api|api.detach (Failed)
278 - code_api|api.detach_spawn_FLAKY (Failed)
279 - code_api|api.detach_spawn_stress_FLAKY (Failed)
289 - code_api|api.static_sideline_FLAKY (Failed)
293 - code_api|api.thread_churn (Failed)
294 - code_api|tool.drcacheoff.legacy (Failed)
295 - code_api|tool.drcacheoff.func_view_noret (Failed)
296 - code_api|tool.drcacheoff.altbindir (Failed)
297 - code_api|tool.drcacheoff.legacy-int-offs (Failed)
298 - no_code_api,no_intercept_all_signals|linux.sigaction (Failed)
These are the tests that consistently failed on AArch32 with 632f8d1d8ee2eb9fd6a3027d83cea6845d8d66b2 (2025-01-30) on Debian 10 (released in 2019):
4 - tool.drcacheoff.raw2trace_unit_tests (Failed)
5 - tool.scheduler.unit_tests (Child aborted)
8 - tool.drcacheoff.trace_interval_analysis_unit_tests (Failed)
10 - tool.drcachesim.decode_cache_test (SEGFAULT)
91 - code_api|linux.fib-conflict (Failed)
92 - code_api|linux.fib-conflict-early (Failed)
102 - code_api|pthreads.pthreads_fork_FLAKY (Failed)
122 - code_api|client.file_io (Failed)
131 - code_api|client.drmgr-test (Failed)
138 - code_api|client.drbbdup-drwrap-test (Failed)
142 - code_api|client.drreg-test (Failed)
143 - code_api|client.drreg-end-restore (Failed)
156 - code_api|client.drutil-test (Failed)
174 - code_api|sample.statecmp (Failed)
196 - code_api|tool.drcachesim.delay-global (Failed)
227 - code_api|tool.drcacheoff.sysnums (Failed)
231 - code_api|tool.drcacheoff.max-global (Failed)
237 - code_api|tool.drcacheoff.windows-invar (Failed)
256 - code_api|tool.drcacheoff.invariant_checker (Failed)
257 - code_api|tool.drcacheoff.invariant_checker_pthreads (Failed)
258 - code_api|tool.histogram.offline (Failed)
259 - code_api|tool.record_filter (Failed)
260 - code_api|tool.record_filter_bycore_uni (Failed)
261 - code_api|tool.record_filter_bycore_multi (Failed)
262 - code_api|tool.drcachesim.drstatecmp-delay-simple (Failed)
263 - code_api|tool.drcacheoff.drstatecmp-delay-simple (Failed)
266 - code_api|tool.drcacheoff.getretaddr_record_replace_retaddr (Failed)
275 - code_api|client.drwrap-test-detach (Failed)
277 - code_api|api.disA32 (Failed)
279 - code_api|api.startstop (Failed)
280 - code_api|api.detach (Failed)
281 - code_api|api.detach_spawn_FLAKY (Failed)
282 - code_api|api.detach_spawn_stress_FLAKY (Failed)
292 - code_api|api.static_sideline_FLAKY (Failed)
296 - code_api|api.thread_churn (Failed)
297 - code_api|tool.drcacheoff.legacy (Failed)
298 - code_api|tool.drcacheoff.func_view_noret (Failed)
In addition, the following tests seemed to fail consistently on one of the systems I tested on:
79 - code_api|linux.signal_race (Failed)
96 - code_api|linux.thread-reset (Failed)
118 - code_api|client.detach_test (Timeout)
219 - code_api|tool.drcacheoff.multiproc (Timeout)
I tested the above in Docker containers on two different AArch64 systems with different hardware and kernel versions. Some notes for anyone wanting to reproduce this:
- On most AArch64 systems you can run an AArch32 Docker container with:
docker run --privileged arm32v7/debian:10 linux32 /bin/bash - If you forget the
linux32the build fails with errors that includeerror: unknown type name '__uint128_t'andError: ARM register expected. - If you forget the
--privilegedand don't use--security-opt seccomp=unconfinedinstead then twolinux.sigactiontests fail because Docker by default disables certain system calls. Another test works around the absence of those system calls so passes too easily in a default Docker container so it is better to use one of those options when testing. tool.drcacheoff.analysis_unit_testssometimes needs many attempts before it passes.