Flatcar icon indicating copy to clipboard operation
Flatcar copied to clipboard

systemd unit `ldconfig.service` fails to start

Open ader1990 opened this issue 8 months ago • 6 comments

Description

systemd unit ldconfig.service fails to start sometimes on the first boot or the subsequent reboots, but it cannot be reproduced reliably. ldconfig.service runs before the switchroot during the initrd stage and is more cumbersome to properly reproduce it. [UPDATE] ldconfig.service runs after switchroot.

This is an issue that I have seen in the wild for a while, usually after rebooting a Flatcar instance on ARM64. From what I know, this issue does not affect the functionality of the Flatcar instance.

This issue has made a recurrence recently in the Github Actions. From the GitHub Actions:

  L1: "  "
    L2: " Error: _raid.go:245: could not reboot machine: machine __a8565d1f-ee8d-4220-9dbb-92c8f030eefb__ failed basic checks: some systemd units failed:"
    L3: "??? ldconfig.service loaded failed failed Rebuild Dynamic Linker Cache"
    L4: "status: "
    L5: "journal:-- No entries --"
    L6: "harness.go:593: Found systemd unit failed to start (?[0;1;39mldconfig.service?[0m - Rebuild Dynamic Linker Cache.  ) on machine a8565d1f-ee8d-4220-9dbb-92c8f030eefb console_"
    L7: " "

Debugging this issue, I could get a warning message by running manually ldconfig -X, which had exit code 0 and error message: Message: /lib/ld.so.conf is not an ELF file - it has the wrong magic bytes .... This issue might be due to the wrong path for the /lib/ld.so.conf and it might be related to this commit: https://github.com/flatcar/scripts/commit/ba45a2bfb2b2b5e94ae4ee6bb1965a3a8080e3c1. I do not know yet if the warning messsage and the systemd unit failure are related, but maybe the warning message sometimes is seen as error output and thus fails the unit.

The definition of the systemd unit:

systemctl cat ldconfig.service
# /usr/lib/systemd/system/ldconfig.service
#  SPDX-License-Identifier: LGPL-2.1-or-later
#
#  This file is part of systemd.
#
#  systemd is free software; you can redistribute it and/or modify it
#  under the terms of the GNU Lesser General Public License as published by
#  the Free Software Foundation; either version 2.1 of the License, or
#  (at your option) any later version.

[Unit]
Description=Rebuild Dynamic Linker Cache
Documentation=man:ldconfig(8)

ConditionNeedsUpdate=|/etc
ConditionFileNotEmpty=|!/etc/ld.so.cache

DefaultDependencies=no
After=local-fs.target
Before=sysinit.target systemd-update-done.service
Conflicts=shutdown.target initrd-switch-root.target
Before=shutdown.target initrd-switch-root.target

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/sbin/ldconfig -X

Impact

The test framework needs to re-run the test, sometimes 3 or 4 times. In real world scenarios, because of the systemd unit failure, automation might break or sanitiy check tools might flag this issue.

Environment and steps to reproduce

Example test run that had to retry some of the Mantle tests: https://github.com/flatcar/scripts/actions/runs/9777950641?pr=2089

Expected behavior

systemd unit ldconfig.service should not fail.

ader1990 avatar Jul 04 '24 08:07 ader1990