open-balena icon indicating copy to clipboard operation
open-balena copied to clipboard

Services crash without log.

Open simon-zumbrunnen opened this issue 3 years ago • 6 comments

Because I use Traefik for all of my services, I can't use the quickstart, so I'm trying to deploy open-balena using my own docker-compose.yml. One problem I ran into is, that your services don't show any logs when they crash. All I see is:

Systemd init system enabled.

But since it crashed I can't exec into the container to look at the log files. For the api I have solved the problem by creating my own Dockerfile without systemd. For almost all others I used the original image instead of the balena one (e.g. registry:2 or postgres). But for the vpn service this isn't that easy.

Do you have any guidance on how to debug this?

simon-zumbrunnen avatar Dec 17 '20 20:12 simon-zumbrunnen

Same issue here, on Ubuntu 21.10 but I'm using ./scripts/compose up, i get

openbalena-s3-1             | Systemd init system enabled.
openbalena-s3-1 exited with code 255
openbalena-api-1            | Systemd init system enabled.
openbalena-registry-1       | Systemd init system enabled.
openbalena-api-1 exited with code 255
openbalena-registry-1 exited with code 255
openbalena-vpn-1            | Systemd init system enabled.
openbalena-vpn-1 exited with code 255

gabrielepmattia avatar Jan 16 '22 08:01 gabrielepmattia

Most services use systemd. You can get logs by running ./scripts/compose exec -it <service> journalctl -fn100.

dfunckt avatar Jan 17 '22 12:01 dfunckt

Yeah but as I said, because the container is not running (crashed) I can't use exec.

simon-zumbrunnen avatar Jan 17 '22 15:01 simon-zumbrunnen

@seimsel on which OS are you trying to run open-balena? Did you run the containers as privileged (see below)

However, after different tests, it seems that the problem is in the image https://github.com/balena-io-modules/open-balena-base if you try to start the image from Ubuntu 18.04 then it works, otherwise in other distros like Ubuntu 21.XX or Fedora 33 the final container entrypoint exec /sbin/init crashes. The command that I used is the following:

docker run --privileged -it -v /sys/fs/cgroup:/sys/fs/cgroup:ro balena/open-balena-base

If it works (i.e. the container starts) then you can run open-balena. Remember to run the container in privileged mode and to attach the cgroup folder since it spawns multiple processes by using the init process as container entry point.

On Fedora 33/Ubuntu 21.10 I get

Systemd init system enabled.
systemd 247.3-6 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +ZSTD +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=unified)
Detected virtualization lxc.
Detected architecture x86-64.

Welcome to Debian GNU/Linux 11 (bullseye)!

Set hostname to <261f1fa6b3af>.
Failed to create /init.scope control group: Read-only file system
Failed to allocate manager object: Read-only file system
[!!!!!!] Failed to allocate manager object.
Exiting PID 1...

gabrielepmattia avatar Jan 20 '22 17:01 gabrielepmattia

@seimsel on which OS are you trying to run open-balena? Did you run the containers as privileged (see below)

However, after different tests, it seems that the problem is in the image https://github.com/balena-io-modules/open-balena-base if you try to start the image from Ubuntu 18.04 then it works, otherwise in other distros like Ubuntu 21.XX or Fedora 33 the final container entrypoint exec /sbin/init crashes. The command that I used is the following:

docker run --privileged -it -v /sys/fs/cgroup:/sys/fs/cgroup:ro balena/open-balena-base

If it works (i.e. the container starts) then you can run open-balena. Remember to run the container in privileged mode and to attach the cgroup folder since it spawns multiple processes by using the init process as container entry point.

On Fedora 33/Ubuntu 21.10 I get

Systemd init system enabled.
systemd 247.3-6 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +ZSTD +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=unified)
Detected virtualization lxc.
Detected architecture x86-64.

Welcome to Debian GNU/Linux 11 (bullseye)!

Set hostname to <261f1fa6b3af>.
Failed to create /init.scope control group: Read-only file system
Failed to allocate manager object: Read-only file system
[!!!!!!] Failed to allocate manager object.
Exiting PID 1...

Thank you for your help. I was using Ubuntu 18.04 and I probably didn't run it in privileged mode. But my question isn't: "what do I have to do to make it work", but: "how do I figure out why it failed".

simon-zumbrunnen avatar Mar 14 '22 06:03 simon-zumbrunnen

This problem occurs when your host-os uses cgroups2 exclusively and no cgroup v1. The balena containers want to start systemd inside the container but this is not possible with just cgroups2. To make your host-os use cgroup2 and 1 together run:

echo 'GRUB_CMDLINE_LINUX=systemd.unified_cgroup_hierarchy=false' > /etc/default/grub.d/cgroup.cfg
update-grub

and restart.

markdegrootnl avatar May 11 '22 13:05 markdegrootnl