hydra icon indicating copy to clipboard operation
hydra copied to clipboard

abort in hydra-eval-jobs: Collecting from unknown thread

Open 2xsaiko opened this issue 2 years ago • 3 comments

Describe the bug Evaluating my jobset (configuration for some of my NixOS systems) causes hydra-eval-jobs to ~~always~~ (edit: not always! it ended up evaluating successfully after crashing like 30 times) abort in GC_push_all_stacks with the message "Collecting from unknown thread". (pthread_stop_world.c:754 in libgc.so). Executing the build manually with nix build finishes without problems.

To Reproduce Steps to reproduce the behavior:

  1. Evaluate flake git+https://git.dblsaiko.net/systems (might be important that hydra is running on aarch64?)
  2. hydra-eval-jobs should end up coredumping

Expected behavior The process doesn't abort and finishes evaluating the jobset correctly.

Hydra Server:

Please fill out this data as well as you can, but don't worry if you can't -- just do your best.

  • OS and version: NixOS 21.11.20220325.d89f18a
  • Version of Hydra: 2021-08-11
  • Version of Nix Hydra is built against: nix-2.5pre20211206_d1aaa7e
  • Version of the Nix daemon: 2.5.0pre20211206_d1aaa7e

Additional context Here's the core dump log from systemd. The exact stack trace is always different but it always ends up in GC_malloc_kind_global to GC_push_all_stacks where it ends up aborting.

Mar 29 14:11:00 spike hydra-evaluator[911]: starting evaluation of jobset ‘systems:master (jobset#5)’ (last checked 60 s ago)
Mar 29 14:11:01 spike nix-daemon[1083]: accepted connection from pid 118191, user hydra
Mar 29 14:11:01 spike nix-daemon[1083]: accepted connection from pid 118215, user hydra
Mar 29 14:11:04 spike systemd[1]: Started Process Core Dump (PID 121035/UID 0).
Mar 29 14:11:07 spike systemd-coredump[121036]: [🡕] Process 118215 (hydra-eval-jobs) of user 122 dumped core.
                                                
                                                Found module linux-vdso.so.1 with build-id: 2df0e272c95568f51a3a1921a822b54330132699
                                                Found module libnss_dns.so.2 with build-id: 62c5a4fcec5da4f113241fda61ea1afc9b2d683d
                                                Found module libattr.so.1 without build-id.
                                                Found module libresolv.so.2 with build-id: 0fe000e8a1dcb24d46153bd487b7c4ef3c0200fd
                                                Found module libkeyutils.so.1 without build-id.
                                                Found module libkrb5support.so.0 without build-id.
                                                Found module libxml2.so.2 without build-id.
                                                Found module libbz2.so.1 without build-id.
                                                Found module libzstd.so.1 without build-id.
                                                Found module liblzma.so.5 without build-id.
                                                Found module libacl.so.1 without build-id.
                                                Found module libbrotlicommon.so.1 without build-id.
                                                Found module libaws-c-common.so.1 without build-id.
                                                Found module libaws-c-sdkutils.so.1.0.0 without build-id.
                                                Found module libaws-c-cal.so.1.0.0 without build-id.
                                                Found module libaws-c-compression.so.1.0.0 without build-id.
                                                Found module libs2n.so without build-id.
                                                Found module libaws-c-io.so.1.0.0 without build-id.
                                                Found module libaws-c-http.so.1.0.0 without build-id.
                                                Found module libaws-c-auth.so.1.0.0 without build-id.
                                                Found module libaws-c-s3.so.0unstable without build-id.
                                                Found module libaws-checksums.so.1.0.0 without build-id.
                                                Found module libaws-c-event-stream.so.1.0.0 without build-id.
                                                Found module libaws-c-mqtt.so.1.0.0 without build-id.
                                                Found module libaws-crt-cpp.so without build-id.
                                                Found module libcom_err.so.3 without build-id.
                                                Found module libk5crypto.so.3 without build-id.
                                                Found module libkrb5.so.3 without build-id.
                                                Found module libgssapi_krb5.so.2 without build-id.
                                                Found module libssl.so.1.1 with build-id: 85e435fd52ba4ac684891bacf3e582cf2e317e3b
                                                Found module libssh2.so.1 without build-id.
                                                Found module libnghttp2.so.14 without build-id.
                                                Found module libz.so.1 without build-id.
                                                Found module librt.so.1 with build-id: f364ce33c59cd7955db929de439acb10e8053792
                                                Found module libarchive.so.13 without build-id.
                                                Found module libbrotlidec.so.1 without build-id.
                                                Found module libbrotlienc.so.1 without build-id.
                                                Found module libseccomp.so.2 without build-id.
                                                Found module libaws-cpp-sdk-core.so without build-id.
                                                Found module libaws-cpp-sdk-s3.so without build-id.
                                                Found module libaws-cpp-sdk-transfer.so without build-id.
                                                Found module libsodium.so.23 with build-id: 37c3dea45982807673d873baeb8b9c37856ac97a
                                                Found module libcurl.so.4 with build-id: 12782cce4baa4c59fb2640b20503b365128990ab
                                                Found module libsqlite3.so.0 with build-id: 9882a19eb1366385271d43056659352c62e678c0
                                                Found module libnixfetchers.so with build-id: 702e2e886dfec67980b900fc0e9ef548a4ee06a9
                                                Found module libboost_context.so.1.69.0 without build-id.
                                                Found module libcrypto.so.1.1 with build-id: 840d07f85182906a77458e2574e413149009446a
                                                Found module libc.so.6 with build-id: 2ec2584e7cf41bf9a28433370c4fbdac47cc8634
                                                Found module libgcc_s.so.1 without build-id.
                                                Found module libm.so.6 with build-id: 729324a0809db558a202d1a0244a1e0263031859
                                                Found module libstdc++.so.6 without build-id.
                                                Found module libnixutil.so with build-id: 988b87dc322421b48747aecf9bbabd84613a03c2
                                                Found module libnixstore.so with build-id: b56319b1246b8750a75ba295d139ec2c0edf8082
                                                Found module libdl.so.2 with build-id: a046e4bc181def5e579fac40210507736492f350
                                                Found module libpthread.so.0 with build-id: 7f4b6b86e1f1dcb6c793869a655c23cf82b1f45c
                                                Found module libgc.so.1 with build-id: 6dd737074c128cc3ef070a7f9a65313b3fe6461d
                                                Found module libnixexpr.so with build-id: da78f552b9b5f7d3ee06edf0da865a6c5e162017
                                                Found module libnixmain.so with build-id: 66f96d7d7a0d5f6de353a8114472c181ac3875a4
                                                Found module hydra-eval-jobs without build-id.
                                                Stack trace of thread 118215:
                                                #0  0x0000ffffb81a6c20 raise (libc.so.6 + 0x36c20)
                                                #1  0x0000ffffb8194678 abort (libc.so.6 + 0x24678)
                                                #2  0x0000ffffb8a71f6c GC_push_all_stacks (libgc.so.1 + 0x1cf6c)
                                                #3  0x0000ffffb8a6d6d4 GC_mark_some (libgc.so.1 + 0x186d4)
                                                #4  0x0000ffffb8a6d878 GC_stopped_mark (libgc.so.1 + 0x18878)
                                                #5  0x0000ffffb8a6ee4c GC_try_to_collect_inner (libgc.so.1 + 0x19e4c)
                                                #6  0x0000ffffb8a6f254 GC_collect_or_expand (libgc.so.1 + 0x1a254)
                                                #7  0x0000ffffb8a6f68c GC_allocobj (libgc.so.1 + 0x1a68c)
                                                #8  0x0000ffffb8a6fa68 GC_generic_malloc_inner (libgc.so.1 + 0x1aa68)
                                                #9  0x0000ffffb8a73650 GC_generic_malloc (libgc.so.1 + 0x1e650)
                                                #10 0x0000ffffb8a73a18 GC_malloc_kind_global (libgc.so.1 + 0x1ea18)
                                                #11 0x0000ffffb8a74ddc GC_strndup (libgc.so.1 + 0x1fddc)
                                                #12 0x0000ffffb8d87228 _ZN3nix8mkStringERNS_5ValueESt17basic_string_viewIcSt11char_traitsIcEERKSt3setINSt7__cxx1112basic_stringIcS4_SaIcEEESt4lessISA_ESaISA_EE (libnixexpr.so + 0xa0228)
                                                #13 0x0000ffffb8d95040 _ZN3nix17ExprConcatStrings4evalERNS_9EvalStateERNS_3EnvERNS_5ValueE (libnixexpr.so + 0xae040)
                                                #14 0x0000ffffb8d94dd8 _ZN3nix17ExprConcatStrings4evalERNS_9EvalStateERNS_3EnvERNS_5ValueE (libnixexpr.so + 0xaddd8)
                                                #15 0x0000ffffb8d94dd8 _ZN3nix17ExprConcatStrings4evalERNS_9EvalStateERNS_3EnvERNS_5ValueE (libnixexpr.so + 0xaddd8)
                                                #16 0x0000ffffb8d94dd8 _ZN3nix17ExprConcatStrings4evalERNS_9EvalStateERNS_3EnvERNS_5ValueE (libnixexpr.so + 0xaddd8)
                                                #17 0x0000ffffb8d94dd8 _ZN3nix17ExprConcatStrings4evalERNS_9EvalStateERNS_3EnvERNS_5ValueE (libnixexpr.so + 0xaddd8)
                                                #18 0x0000ffffb8d94dd8 _ZN3nix17ExprConcatStrings4evalERNS_9EvalStateERNS_3EnvERNS_5ValueE (libnixexpr.so + 0xaddd8)
                                                #19 0x0000ffffb8d94dd8 _ZN3nix17ExprConcatStrings4evalERNS_9EvalStateERNS_3EnvERNS_5ValueE (libnixexpr.so + 0xaddd8)
                                                #20 0x0000ffffb8d94dd8 _ZN3nix17ExprConcatStrings4evalERNS_9EvalStateERNS_3EnvERNS_5ValueE (libnixexpr.so + 0xaddd8)
                                                #21 0x0000ffffb8d94dd8 _ZN3nix17ExprConcatStrings4evalERNS_9EvalStateERNS_3EnvERNS_5ValueE (libnixexpr.so + 0xaddd8)
                                                #22 0x0000ffffb8d94dd8 _ZN3nix17ExprConcatStrings4evalERNS_9EvalStateERNS_3EnvERNS_5ValueE (libnixexpr.so + 0xaddd8)
                                                #23 0x0000ffffb8d94dd8 _ZN3nix17ExprConcatStrings4evalERNS_9EvalStateERNS_3EnvERNS_5ValueE (libnixexpr.so + 0xaddd8)
                                                #24 0x0000ffffb8d94dd8 _ZN3nix17ExprConcatStrings4evalERNS_9EvalStateERNS_3EnvERNS_5ValueE (libnixexpr.so + 0xaddd8)
                                                #25 0x000000000043e914 _ZN3nix9EvalState10forceValueERNS_5ValueERKNS_3PosE (hydra-eval-jobs + 0x3e914)
                                                #26 0x0000ffffb8e22998 _ZN3nixL21prim_derivationStrictERNS_9EvalStateERKNS_3PosEPPNS_5ValueERS5_ (libnixexpr.so + 0x13b998)
                                                #27 0x0000ffffb8d8edf8 _ZN3nix9EvalState12callFunctionERNS_5ValueEmPPS1_S2_RKNS_3PosE (libnixexpr.so + 0xa7df8)
                                                #28 0x0000ffffb8d900ec _ZN3nix8ExprCall4evalERNS_9EvalStateERNS_3EnvERNS_5ValueE (libnixexpr.so + 0xa90ec)
                                                #29 0x0000ffffb8e1fd2c _ZN3nix12prim_getAttrERNS_9EvalStateERKNS_3PosEPPNS_5ValueERS5_ (libnixexpr.so + 0x138d2c)
                                                #30 0x0000ffffb8d8edf8 _ZN3nix9EvalState12callFunctionERNS_5ValueEmPPS1_S2_RKNS_3PosE (libnixexpr.so + 0xa7df8)
                                                #31 0x0000ffffb8d900ec _ZN3nix8ExprCall4evalERNS_9EvalStateERNS_3EnvERNS_5ValueE (libnixexpr.so + 0xa90ec)
                                                #32 0x000000000043e914 _ZN3nix9EvalState10forceValueERNS_5ValueERKNS_3PosE (hydra-eval-jobs + 0x3e914)
                                                #33 0x0000ffffb8d916f8 _ZN3nix10ExprSelect4evalERNS_9EvalStateERNS_3EnvERNS_5ValueE (libnixexpr.so + 0xaa6f8)
                                                #34 0x0000ffffb8d92220 _ZN3nix10ExprAssert4evalERNS_9EvalStateERNS_3EnvERNS_5ValueE (libnixexpr.so + 0xab220)
                                                #35 0x000000000043e914 _ZN3nix9EvalState10forceValueERNS_5ValueERKNS_3PosE (hydra-eval-jobs + 0x3e914)
                                                #36 0x0000ffffb8d9469c _ZN3nix9EvalState14coerceToStringERKNS_3PosERNS_5ValueERSt3setINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt4lessISC_ESaISC_EEbbb (libnixexpr.so + 0xad69c)
                                                #37 0x0000ffffb8d94a68 _ZN3nix9EvalState14coerceToStringERKNS_3PosERNS_5ValueERSt3setINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt4lessISC_ESaISC_EEbbb (libnixexpr.so + 0xada68)
                                                #38 0x0000ffffb8e22a48 _ZN3nixL21prim_derivationStrictERNS_9EvalStateERKNS_3PosEPPNS_5ValueERS5_ (libnixexpr.so + 0x13ba48)
                                                #39 0x0000ffffb8d8edf8 _ZN3nix9EvalState12callFunctionERNS_5ValueEmPPS1_S2_RKNS_3PosE (libnixexpr.so + 0xa7df8)
                                                #40 0x0000ffffb8d900ec _ZN3nix8ExprCall4evalERNS_9EvalStateERNS_3EnvERNS_5ValueE (libnixexpr.so + 0xa90ec)
                                                #41 0x0000ffffb8e1fd2c _ZN3nix12prim_getAttrERNS_9EvalStateERKNS_3PosEPPNS_5ValueERS5_ (libnixexpr.so + 0x138d2c)
                                                #42 0x0000ffffb8d8edf8 _ZN3nix9EvalState12callFunctionERNS_5ValueEmPPS1_S2_RKNS_3PosE (libnixexpr.so + 0xa7df8)
                                                #43 0x0000ffffb8d900ec _ZN3nix8ExprCall4evalERNS_9EvalStateERNS_3EnvERNS_5ValueE (libnixexpr.so + 0xa90ec)
                                                #44 0x000000000043e914 _ZN3nix9EvalState10forceValueERNS_5ValueERKNS_3PosE (hydra-eval-jobs + 0x3e914)
                                                #45 0x0000ffffb8d916f8 _ZN3nix10ExprSelect4evalERNS_9EvalStateERNS_3EnvERNS_5ValueE (libnixexpr.so + 0xaa6f8)
                                                #46 0x0000ffffb8d92220 _ZN3nix10ExprAssert4evalERNS_9EvalStateERNS_3EnvERNS_5ValueE (libnixexpr.so + 0xab220)
                                                #47 0x000000000043e914 _ZN3nix9EvalState10forceValueERNS_5ValueERKNS_3PosE (hydra-eval-jobs + 0x3e914)
                                                #48 0x0000ffffb8d9469c _ZN3nix9EvalState14coerceToStringERKNS_3PosERNS_5ValueERSt3setINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt4lessISC_ESaISC_EEbbb (libnixexpr.so + 0xad69c)
                                                #49 0x0000ffffb8d94a68 _ZN3nix9EvalState14coerceToStringERKNS_3PosERNS_5ValueERSt3setINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt4lessISC_ESaISC_EEbbb (libnixexpr.so + 0xada68)
                                                #50 0x0000ffffb8d94784 _ZN3nix9EvalState14coerceToStringERKNS_3PosERNS_5ValueERSt3setINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt4lessISC_ESaISC_EEbbb (libnixexpr.so + 0xad784)
                                                #51 0x0000ffffb8e22a48 _ZN3nixL21prim_derivationStrictERNS_9EvalStateERKNS_3PosEPPNS_5ValueERS5_ (libnixexpr.so + 0x13ba48)
                                                #52 0x0000ffffb8d8edf8 _ZN3nix9EvalState12callFunctionERNS_5ValueEmPPS1_S2_RKNS_3PosE (libnixexpr.so + 0xa7df8)
                                                #53 0x0000ffffb8d900ec _ZN3nix8ExprCall4evalERNS_9EvalStateERNS_3EnvERNS_5ValueE (libnixexpr.so + 0xa90ec)
                                                #54 0x0000ffffb8e1fd2c _ZN3nix12prim_getAttrERNS_9EvalStateERKNS_3PosEPPNS_5ValueERS5_ (libnixexpr.so + 0x138d2c)
                                                #55 0x0000ffffb8d8edf8 _ZN3nix9EvalState12callFunctionERNS_5ValueEmPPS1_S2_RKNS_3PosE (libnixexpr.so + 0xa7df8)
                                                #56 0x0000ffffb8d900ec _ZN3nix8ExprCall4evalERNS_9EvalStateERNS_3EnvERNS_5ValueE (libnixexpr.so + 0xa90ec)
                                                #57 0x000000000043e914 _ZN3nix9EvalState10forceValueERNS_5ValueERKNS_3PosE (hydra-eval-jobs + 0x3e914)
                                                #58 0x0000ffffb8d916f8 _ZN3nix10ExprSelect4evalERNS_9EvalStateERNS_3EnvERNS_5ValueE (libnixexpr.so + 0xaa6f8)
                                                #59 0x0000ffffb8d92220 _ZN3nix10ExprAssert4evalERNS_9EvalStateERNS_3EnvERNS_5ValueE (libnixexpr.so + 0xab220)
                                                #60 0x000000000043e914 _ZN3nix9EvalState10forceValueERNS_5ValueERKNS_3PosE (hydra-eval-jobs + 0x3e914)
                                                #61 0x0000ffffb8d9469c _ZN3nix9EvalState14coerceToStringERKNS_3PosERNS_5ValueERSt3setINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt4lessISC_ESaISC_EEbbb (libnixexpr.so + 0xad69c)
                                                #62 0x0000ffffb8d94a68 _ZN3nix9EvalState14coerceToStringERKNS_3PosERNS_5ValueERSt3setINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt4lessISC_ESaISC_EEbbb (libnixexpr.so + 0xada68)
                                                #63 0x0000ffffb8d94784 _ZN3nix9EvalState14coerceToStringERKNS_3PosERNS_5ValueERSt3setINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt4lessISC_ESaISC_EEbbb (libnixexpr.so + 0xad784)
Mar 29 14:11:07 spike systemd[1]: [email protected]: Deactivated successfully.
Mar 29 14:11:07 spike systemd[1]: [email protected]: Consumed 3.082s CPU time, no IP traffic.
Mar 29 14:11:07 spike hydra-evaluator[118188]: hydra-eval-jobs returned exit code 1:
Mar 29 14:11:07 spike hydra-evaluator[118188]: Collecting from unknown thread
Mar 29 14:11:07 spike hydra-evaluator[118188]: error: unexpected EOF reading a line
Mar 29 14:11:07 spike hydra-evaluator[911]: evaluation of jobset ‘systems:master (jobset#5)’ failed with exit code 1

2xsaiko avatar Mar 29 '22 12:03 2xsaiko

Same issue on https://github.com/NixOS/nixpkgs/commit/90cd5459a1fd707819b9a3fb9c852beaaac3b79a, also aarch64.

misuzu avatar Jun 12 '22 18:06 misuzu

The problem hits me today, and I managed to workaround this issue by adding a GC_DONT_GC environment variable to hydra-evaluator.service (some thing like https://gitlab.com/highsunz/flames/-/commit/9cd2a0a3f48abb0c5c57d3ee049f72e31cf1ec2e).

This workaround comes from https://github.com/NixOS/nix/issues/4178#issuecomment-738886808.

blurgyy avatar Aug 30 '22 11:08 blurgyy

Can confirm I've encountered this issue right after migrating my server from x86_64 to aarch64 (while keeping the same config). GC_DONT_GC does help.

chayleaf avatar Oct 18 '23 11:10 chayleaf