cirrus-ci-docs icon indicating copy to clipboard operation
cirrus-ci-docs copied to clipboard

Recent container change broke ThreadSanitizer builds

Open timwoj opened this issue 2 years ago • 11 comments

Expected Behavior

C++ builds using ThreadSanitizer should complete correctly.

Real Behavior

ThreadSanitizer reports the following error when trying to run any binary:

FATAL: ThreadSanitizer: unexpected memory mapping 0x5bb456972000-0x5bb456973000

Related Info

This is a (tick one of the following):

  • [ ] Website issue
    • Link to page:
  • [x] Task issue
    • OS: Docker
    • Task name: https://cirrus-ci.com/task/5051526650003456

The log for the task above shows the configure script failing because it thinks that the OpenSSL headers and library differ. Manual investigation using terminal mode shows CMake failing for the reason above. This failure just started recently (in the last few weeks). It doesn't happen with docker containers started with the same Dockerfile on other systems. It's only happening to us on the Cirrus infra. It appears familiar to https://github.com/golang/go/issues/59418, which was caused by a kernel issue (fixed in https://go-review.googlesource.com/c/build/+/482195).

timwoj avatar Sep 21 '23 16:09 timwoj

Yeah, Cirrus CI is using Container-Optimized OS version 105 for the x86 and Arm containers. You can put experimental: true flag for the task that is failing. This way it will temporary run on the old infrastructure.

Let's see if the next version of Container-Optimized OS will fix the issues.

fkorotkov avatar Sep 21 '23 17:09 fkorotkov

Yeah, Cirrus CI is using Container-Optimized OS version 105 for the x86 and Arm containers. You can put experimental: true flag for the task that is failing. This way it will temporary run on the old infrastructure.

Same result with the experimental tag. uname -a on that build says this:

Linux cirrus-ci-task-6181902449639424 5.15.120+ #1 SMP Fri Jul 21 03:39:30 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Is that correct?

Here's the task configuration:

tsan_sanitizer_task:
  experimental: true
  container:
    # Just uses a recent/common distro to run memory error/leak checks.
    dockerfile: ci/ubuntu-22.04/Dockerfile
    << : *SANITIZERS_RESOURCE_TEMPLATE

  << : *CI_TEMPLATE
  << : *SKIP_TASK_ON_PR
  env:
    ZEEK_CI_CONFIGURE_FLAGS: *TSAN_SANITIZER_CONFIG
    ZEEK_CI_DISABLE_SCRIPT_PROFILING: 1
    # If this is defined directly in the environment, configure fails to find
    # OpenSSL. Instead we define it with a different name and then give it
    # the correct name in the testing scripts.
    ZEEK_TSAN_OPTIONS: suppressions=/zeek/ci/tsan_suppressions.txt

I tried with the experimental tag in the container block too but that failed the same way. https://cirrus-ci.com/task/5715854843707392 has the last failure.

timwoj avatar Sep 21 '23 18:09 timwoj

Could you please try privileged: true for your container instance then? This way a dedicated VM will be used for running your task. It will be a bit slower to schedule but you'll have an Ubuntu.

fkorotkov avatar Sep 21 '23 18:09 fkorotkov

privileged: true in the outer task block and with experimental removed?

tsan_sanitizer_task:
  privileged: true
  container:
    # Just uses a recent/common distro to run memory error/leak checks.
    dockerfile: ci/ubuntu-22.04/Dockerfile
    << : *SANITIZERS_RESOURCE_TEMPLATE

  << : *CI_TEMPLATE
  << : *SKIP_TASK_ON_PR
  env:
    ZEEK_CI_CONFIGURE_FLAGS: *TSAN_SANITIZER_CONFIG
    ZEEK_CI_DISABLE_SCRIPT_PROFILING: 1
    # If this is defined directly in the environment, configure fails to find
    # OpenSSL. Instead we define it with a different name and then give it
    # the correct name in the testing scripts.
    ZEEK_TSAN_OPTIONS: suppressions=/zeek/ci/tsan_suppressions.txt

That gets me through the configure step, but the build fails for the same reason when it tries to run a binary as part of the build:

[ 15%] [BIFCL] Processing /zeek/auxil/zeek-af_packet-plugin/src/af_packet.bif
FATAL: ThreadSanitizer: unexpected memory mapping 0x5ad5e496e000-0x5ad5e4973000

https://cirrus-ci.com/task/5992372387971072?logs=build#L967

timwoj avatar Sep 21 '23 18:09 timwoj

That gets me through the configure step, but the build fails for the same reason when it tries to run a binary as part of the build:

I re-ran the build this morning to double-check something and it failed during configure again.

timwoj avatar Sep 22 '23 16:09 timwoj

If it's still fails with ThreadSanitizer than it might not be an issue with cos 105 version. I found another old report of a similar issue https://github.com/google/sanitizers/issues/806 where the problem was in the old version of gcc.

If you have an x86 host with docker you might try to reproduce the issue using gcr.io/cirrus-ci-community/zeek/zeek/ci/ubuntu-2204/dockerfile:dae6979fc92dcba631e38ce7cf2335a7 container that is used in CI.

fkorotkov avatar Sep 22 '23 16:09 fkorotkov

I found another old report of a similar issue https://github.com/google/sanitizers/issues/806 where the problem was in the old version of gcc.

I've tried it with both gcc 11 (ubuntu 22) and 12 (ubuntu 23), so I don't think that's it.

If you have an x86 host with docker you might try to reproduce the issue using gcr.io/cirrus-ci-community/zeek/zeek/ci/ubuntu-2204/dockerfile:dae6979fc92dcba631e38ce7cf2335a7 container that is used in CI.

I'll see if I can scrounge up an old system to test it with.

timwoj avatar Sep 22 '23 16:09 timwoj

We are also running into this. It should be trivial to reproduce with: echo 'void main(void){}' | gcc -pie -fPIE -fsanitize=thread -xc - -ltsan && ./a.out:

FATAL: ThreadSanitizer: unexpected memory mapping 0x56ce963d3000-0x56ce963d4000
Exit status: 66

See https://cirrus-ci.com/task/6173534590861312?logs=test#L2

Using gcc-13 from Ubuntu 23.10 (beta).

I understand that this is likely possible to fix by using a full GCE VM, but it would be nice if tsan in containers was supported again on Cirrus CI, like before.

maflcko avatar Oct 03 '23 16:10 maflcko

I checked for https://github.com/google/sanitizers/issues/877#issuecomment-343644727 but that didn't seem to be the cause here either.

maflcko avatar Oct 03 '23 17:10 maflcko

I just wanted to check in and note that this is still broken.

timwoj avatar Jan 08 '24 19:01 timwoj

As a temporary workaround, I think clang-18 from Ubuntu Noble 24.04 may work, instead of gcc.

maflcko avatar Apr 24 '24 11:04 maflcko

We ended up moving to Ubuntu 24 as well, which resolved our problems.

timwoj avatar Aug 12 '24 16:08 timwoj