panic: runtime error: invalid nil pointer dereference when point-to-point interface has nil dst address
$ journalctl -au balena
...
Apr 06 18:29:07 balenad[508945]: panic: runtime error: invalid memory address or nil pointer dereference
Apr 06 18:29:07 balenad[508945]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1272ec4]
As further detailed in the linked support thread.
On investigation, we found that the bug is in the netlink library (3rd party) and already fixed (over there) by the following pull request:
https://github.com/vishvananda/netlink/pull/665 IFA_ADDRESS is to be used as the peer address if it differs from IFA_LOCAL. Therefore, include the check for "no IFA_ADDRESS" in the difference check. Example: ppp interfaces can contain IFA_LOCAL and no IFA_ADDRESS attribute
Related reference: https://stackoverflow.com/questions/4678637/what-is-difference-between-ifa-local-and-ifa-address-in-rtnetlink-linux
Known workaround
Not yet confirmed at the time of this writing, but in the linked support thread, I believe that the immediate cause for the engine panic is the following P-t-P:0.0.0.0 value in the ppp0 interface:
$ ifconfig
ppp0 Link encap:Point-to-Point Protocol
inet addr:10.164.233.243 P-t-P:0.0.0.0 Mask:255.255.255.255
I suspect that, by the setting a value other than 0.0.0.0, the engine panic would be avoided. In the linked support thread, I understand that interface ppp0 was associated with cellular (gsm) internet connection, for which NetworkManager reported an IPv4 gateway of value 0.0.0.0:
root@d3ba86f:~# nmcli device show
GENERAL.DEVICE: ttyS0
GENERAL.TYPE: gsm
GENERAL.HWADDR: (unknown)
GENERAL.MTU: 1500
GENERAL.STATE: 100 (connected)
GENERAL.CONNECTION: cellular
GENERAL.CON-PATH: /org/freedesktop/NetworkManager/ActiveConnection/1
IP4.ADDRESS[1]: 10.164.233.243/32
IP4.GATEWAY: 0.0.0.0
IP4.ROUTE[1]: dst = 0.0.0.0/0, nh = 0.0.0.0, mt = 20700
IP4.DNS[1]: 194.151.228.34
IP4.DNS[2]: 194.151.228.18
Above, I believe it is unusual for IP4.GATEWAY to have value 0.0.0.0. I suspect that that IP4.GATEWAY value corresponds to P-t-P:0.0.0.0 in the output of ifconfig for the ppp0 interface. I suspect that setting a non-nil value for IP4.GATEWAY would work around the engine panic.
[pdcastro] This issue has attached support thread https://jel.ly.fish/8e060183-5225-4114-9bfb-43469299a6dd
Hi,
I kept doing some test and reverting to BalenaOS version balenaOS 2.85.2+rev3 seems to work.
I have no idea on why or how ...
I hope it can help solve the issue.
Thanks
You are correct, it seems that the bug was introduced with the update of balena-engine to upstream moby v20.10.12, which was introduced in balenaOS v2.94.0. So the bug would not be present in balenaOS 2.85.2 which has balena-engine v19.03.30
[majorz] This has attached https://jel.ly.fish/b8cffc99-86da-4961-88af-51ecbb4aa590
A workaround is to enable balena-host from boot before the network interfaces come up. For that we can modify the balena-host.service unit and add:
[Install]
WantedBy=multi-user.target
And then do:
systemctl enable balena-host
reboot
That should bring the host engine up from boot and allow to update the hostOS to a patched version.