libc
libc copied to clipboard
static binary crashing with NULL symbol
target: ppc64le-unknown-linux-gnu & stable-x86_64-unknown-linux-gnu
a statically compiled kata agent binary crashes in what looks to be a dynamic-library-related function. The binaries were built using the latest libc so that we could pick up the fix for statically compiling on non-x86 architectures (https://github.com/rust-lang/libc/pull/2046).
However, we are seeing this: dl-call-libc-early-init.c:37: _dl_call_libc_early_init: Assertion sym != NULL failed
The place @amulyam24 has narrowed it down to is possibly in this openpty call: https://github.com/kata-containers/kata-containers/blob/main/src/agent/rustjail/src/container.rs#L881
I was once stuck on an issue with go that involved ptys, and in that case TCSETS and TCGETS values were wrong for ppc64le. See https://github.com/golang/go/issues/19560. So I can't help but be suspicious, but the dl_ part of this makes me think it's probably not related.
Unfortunately, there's not a lot of info in the logs, and we haven't been able to find a small recreator. I'm opening this here in hopes that someone has some tips for getting more information.
fyi @fidencio
/cc @Jakob-Naucke
Hmm @clnperez. Just dumping a few thoughts:
Interestingly enough, openpty
is also the call that I tracked #2033 (now fixed) down to.
That function _dl_call_libc_early_init
uses some static addresses (mirror here), but I don't see why that would be an issue on its own. However, it is relatively new (glibc 2.32 I believe), is this limited to more recent glibc? It's a bit surprising to me since the osbuilder distros are all older than that (unless you upgraded?) and the container libc shouldn't affect this (or should it? From your logs, it appears to be before the pivot_root
/execvp
into the container.)
@tuliom FYI
Thanks @Jakob-Naucke. I am still using the older glibc in the default osbuilder distros. I think @amulyam24 may have tried a newer distro. IIUC, the container libc wouldn't unless this was dynamically compiled either way -- but I could be wrong!
However, it is relatively new (glibc 2.32 I believe), is this limited to more recent glibc?
Can you clarify what you're referring to here when you say "it is relatively new?"
@clnperez I meant: The file elf/dl-call-libc-early-init.c
is in glibc 2.32, but not in glibc 2.31. Your error message refers to this file, but Kata's default osbuilder distros are older than that (CentOS 7 is on 2.17, Debian 9 is on 2.24, Fedora 30 is on 2.29, openSUSE 15 is on 2.26, Ubuntu 18.04 is on 2.30). I was wondering if this issue only occurred with osbuilder distros newer than that. Since you're saying you haven't upgraded: If you haven't already, would you mind trying a container image that uses glibc <2.32, such as ubuntu:latest (20.04)? Maybe it does have something to do with the container glibc. If the error message persists although both guest and container use glibc older than 2.32, I'd be really confused.
Since you're saying you haven't upgraded: If you haven't already, would you mind trying a container image that uses glibc <2.32, such as ubuntu:latest (20.04)? Maybe it does have something to do with the container glibc.
@Jakob-Naucke, to confirm, the error persists irrespective of the glibc version used by the guest or the container(Tried with glibc<2.32 in combination of guest OS - Fedora 30 - glibc 2.29 + container - Ubuntu 18.04 - glibc 2.27).
Ah, it probably stems from the build host glibc (specifically due to the static linkage). I don't have an idea about the underlying issue though.
Ok, so, it turns out we did have a small recreator for this (thanks @Amulyam24). I just hadn't run it on a different host with a downlevel gcc. I can recreate this on my laptop (fedora 33) and a ppc64le system, so it at least isn't a power-only problem.
So I compiled it locally (fc33 on x86), and copied it into a fc32 container to run.
> rustup show
Default host: x86_64-unknown-linux-gnu
rustup home: /home/christy/.rustup
stable-x86_64-unknown-linux-gnu (default)
rustc 1.50.0 (cb75ad5db 2021-02-10)
use libc;
use std::{mem,ptr};
fn main() {
let mut slave = mem::MaybeUninit::<libc::c_int>::uninit();
let mut master = mem::MaybeUninit::<libc::c_int>::uninit();
let p;
unsafe {
p = libc::openpty(
master.as_mut_ptr(),
slave.as_mut_ptr(),
ptr::null_mut(),
ptr::null_mut(),
ptr::null_mut()
);
}
println!("p:{}",p);
}
RUSTFLAGS="-C target-feature=+crt-static" cargo build
FROM fedora:32
RUN dnf install -y glibc
RUN curl https://sh.rustup.rs -sSf | sh -s -- -y
#RUN source $HOME/.cargo/env
ENV PATH="/root/.cargo/bin:$PATH"
RUN echo $PATH
# compiled with RUSTFLAGS="-C target-feature=+crt-static"
COPY openpty/target/debug/openpty .
RUN ./openpty
> docker build -t openpty:fc32 .
Sending build context to Docker daemon 23.53MB
Step 1/7 : FROM fedora:32
---> eb7f88a194d8
Step 2/7 : RUN dnf install -y glibc
---> Using cache
---> 5ea930089860
Step 3/7 : RUN curl https://sh.rustup.rs -sSf | sh -s -- -y
---> Using cache
---> 151e71ed5a86
Step 4/7 : ENV PATH="/root/.cargo/bin:$PATH"
---> Using cache
---> 1a2d67e93c69
Step 5/7 : RUN echo $PATH
---> Using cache
---> bdd9afd9e135
Step 6/7 : COPY openpty/target/debug/openpty .
---> 91e228a9c6be
Step 7/7 : RUN ./openpty
---> Running in 5506ca1edb92
openpty: dl-call-libc-early-init.c:37: _dl_call_libc_early_init: Assertion `sym != NULL' failed.
Just to make sure:
> ldd openpty/target/debug/openpty
not a dynamic executable
After talking with someone on our toolchain team, she suggested I look for anything calling _dl_open
.
> nm -an openpty/target/debug/openpty | grep dl_open$
0000000000493280 T _dl_open
I'm not sure how to track that down. Hopefully some of this is helpful for some more digging at least.
It really seems this only happens with more recent glibc. While I didn't have the time for an exact bisection, I can e.g. reproduce this on Ubuntu 20.10 (2.32) and an Ubuntu 18.04 container (2.27), but I cannot reproduce it on RHEL 8.3 (2.28) and even the most archaic containers like CentOS 6 (2.12). (I used up-to-date 1.51 from Rustup and libc 0.2.92 in both cases.)
@Jakob-Naucke -- I may be misunderstanding your last comment. I thought that you'd mostly root-caused this in this comment that it would only happen with glibc 2.32 and later if built using an earlier glibc. But I'm also surprised that you could reproduce it with 2.27 in the Ubuntu 18.04 container.
if built using an earlier glibc
@clnperez No, I meant building it on a more recent glibc than running it all along. But my comment was very confusing :slightly_smiling_face:
Poking around a little more, I found that this happens when building with glibc >= 2.32 and running on glibc <= 2.31. I think it's also noteworthy that when you build dynamically instead, you will get the error
/lib64/libc.so.6: version `GLIBC_2.32' not found
The error _dl_call_libc_early_init: Assertion sym != NULL failed
is only observed when building statically. However, you can e.g. build on 2.31 and run on 2.28, so not every version mismatch is fatal.
This is reproducible without Rust:
#include <stdio.h>
#include <stdlib.h>
#include <pty.h>
int main() {
int* child = malloc(sizeof(int));
int* parent = malloc(sizeof(int));
int p = openpty(parent, child, 0, 0, 0);
printf("p:%d\n", p);
free(child);
free(parent);
}
built with gcc openpty.c -lutil -static
on a 2.32 system and run on 2.31:
a.out: dl-call-libc-early-init.c:37: _dl_call_libc_early_init: Assertion `sym != NULL' failed.
Aborted (core dumped)
Thank you @Jakob-Naucke ! I reported this issue to glibc here: https://sourceware.org/bugzilla/show_bug.cgi?id=27790
I think it's also noteworthy that when you build dynamically instead, you will get the error /lib64/libc.so.6: version `GLIBC_2.32' not found
Running a program dynamically linked against a newer glibc on a system with an older glibc is unsupported because the older glibc can't support all the features of the new one. IMHO, the real question is in the static case which does make usage of dynamic linking via dl_open()
.
I prefer to collect more details before giving you an answer.
From @tuliom in Bugzilla:
Interestingly, I can't reproduce this issue when the binary is statically linked against glibc 2.33 and executed on 2.31.
@clnperez I can reproduce this with Rust too. Building the Rust example against 2.33 and linking statically, the code runs just fine. So I think this is an issue with glibc 2.32 and no version before or after.
Thanks guys. It's good to know it's not just rust.
And @Jakob-Naucke, I flipped the versions around in my comment . So that only added to the confusion here. :D It does seem to be only related to that one add in glibc 2.32.
Statically linked glibc binaries aren't very portable. In general, you need to run them on a system with exactly the same glibc version. (Dynamically linked binaries can run on newer glibc versions, too.)
In glibc 2.33, the behavior of openpty
changed due to the commit, Linux: Require properly configured /dev/pts for PTYs, so openpty
is safer to use in statically linked programs.
@fweimer-rh -- That seems to indicate that there shouldn't be static agent binaries for kata at all then? /cc @fidencio
@clnperez I don't know your exact requirements. If you need static binaries for isolation from the host environment, you need to stick to a certain subset if using glibc. It so happens that openpty
is in that subset only starting with glibc 2.33. There is no good way to check whether an application sticks to the subset, as every static link currently pulls in the dynamic loader (so its presence in the static binary does not tell you anything about compatibility). We're making slow progress towards improved static linking, but other things have higher priority for upstream work.
Using one of the smaller libcs instead might be a better alternative for you. The other alternative would be to inject the application along with its own dynamically-linked glibc.
Thanks @fweimer-rh Sounds like we should close this as a known limitation then? Also happy to keep it open and test as progress towards improved static linking is made.
building it against musl libc
should solve the issue
ref: https://www.graalvm.org/22.0/reference-manual/native-image/StaticImages/#preparation
This issue can be closed?
- The failure with static builds only happens with glibc 2.32 (also already mentioned above).
- It's not specific to Rust: https://github.com/rust-lang/libc/issues/2054#issuecomment-829124715
This was a helpful reference for me to better understand when static linking to glibc breaks beyond the more commonly cited examples.
Thanks for the updates, I'll close based on the above. If there is more to figure out here, feel free to create a discussion, or reopen/create a new issue if our libc
needs to do something.