core icon indicating copy to clipboard operation
core copied to clipboard

armv7 QNAP Docker: Restarting loop - Core finish process exit code 256 - Core finish process received signal 11

Open magicse opened this issue 1 year ago • 53 comments

Version 2022.7+ currently boot loop on QNAP NAS ts-231p

INFO: Home Assistant Core finish process exit code 256
INFO: Home Assistant Core finish process received signal 11
…

What version of Home Assistant Core has the issue? 2022.7+

What was the last working version of Home Assistant Core? 2022.6.7

What type of installation are you running? Home Assistant Container

Additional information Running on QNAP NAS (TS-231P).

Processor: Annapurna Labs Alpine AL314 Quad-core ARM Cortex-A15 CPU @ 1.70GH

magicse avatar Jan 25 '23 00:01 magicse

https://github.com/home-assistant/core/issues/74707#issuecomment-1241101866

magicse avatar Jan 25 '23 00:01 magicse

Every version from 2022.7+ is failed to start on Docker (QNAP).

The same error message is occurred.

[14:44:09] INFO: Home Assistant Core finish process exit code 256 [14:44:09] INFO: Home Assistant Core finish process received signal 11 [14:44:10] INFO: Home Assistant Core finish process exit code 256 [14:44:10] INFO: Home Assistant Core finish process received signal 11 [14:44:11] INFO: Home Assistant Core finish process exit code 256 [14:44:11] INFO: Home Assistant Core finish process received signal 11

It is looping infinitely.

There is no resolution in any discussion only reverting to a lower version 2022.6.

magicse avatar Jan 25 '23 03:01 magicse

#75142

Wetzel402 avatar Jan 25 '23 15:01 Wetzel402

This issue might be resolved finally. If the latest official container still isn't working you can try linuxserver.io's container.

Wetzel402 avatar Jan 25 '23 15:01 Wetzel402

@magicse can you try with a pre-release version? like the new one from today 2023.2.0b9

So that we can ensure that it's related with the problem #75142.

Because the fix was for armv6 and I'm not sure the problem with your device is the same, although it seems similar.

Gerigot avatar Feb 01 '23 15:02 Gerigot

@Gerigot Due to the constant power outage, it's a little difficult for me to do this, but I'll try.

magicse avatar Feb 02 '23 12:02 magicse

I have the same problem with Qnap TS-431X. Tested versions 2023.3.0.dev20230202, 2023.2.0.dev20230120, 2023.2.0b9 and current "stable".

2022.6.7 works well.

boyarale avatar Feb 03 '23 11:02 boyarale

@boyarale, have you tried linuxserver.io's image? Please test and report back.

Wetzel402 avatar Feb 03 '23 15:02 Wetzel402

Because the fix was for armv6 and I'm not sure the problem with your device is the same, although it seems similar.

QNAP TS-231p3 is arm32v7

magicse avatar Feb 03 '23 17:02 magicse

I've seen other talking about that problem with QNAP and they solved it using the Linuxserver.io image, unfortunately I don't have a QNAP so I can't investing the problem

Gerigot avatar Feb 03 '23 17:02 Gerigot

It may be possible to somehow analyze what has changed during the transition from version 2022.6.7. There should be some explanation for this phenomenon.

magicse avatar Feb 03 '23 17:02 magicse

I think the problem relays on some libraries that HA uses not on the core itself so it's very difficult investigate without a device. I fixed the problem for "pi zero" because I'm using it on that so I could test if the solution was good.

Gerigot avatar Feb 03 '23 17:02 Gerigot

like the new one from today 2023.2.0b9 My test config

version: '3.8'
services:
  homeassistant:
    restart: always
    image: homeassistant/home-assistant:2023.2.0b9
    network_mode: host
    privileged: true
    environment:
      - DISABLE_JEMALLOC=true
      - TZ=Europe/Berlin
    volumes:
      - /share/Container/hass-config:/config

Error log

s6-rc: info: service s6rc-oneshot-runner: starting
s6-rc: info: service s6rc-oneshot-runner successfully started
s6-rc: info: service fix-attrs: starting
s6-rc: info: service fix-attrs successfully started
s6-rc: info: service legacy-cont-init: starting
s6-rc: info: service legacy-cont-init successfully started
s6-rc: info: service legacy-services: starting
services-up: info: copying legacy longrun home-assistant (no readiness notification)
s6-rc: info: service legacy-services successfully started
[17:41:21] INFO: Home Assistant Core finish process exit code 256
[17:41:21] INFO: Home Assistant Core finish process received signal 11
[17:41:22] INFO: Home Assistant Core finish process exit code 256
[17:41:22] INFO: Home Assistant Core finish process received signal 11
[17:41:23] INFO: Home Assistant Core finish process exit code 256

magicse avatar Feb 03 '23 17:02 magicse

bash-5.1# python3 
Python 3.10.7 (main, Nov 24 2022, 13:02:43) [GCC 11.2.1 20220219] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import jwt
Segmentation fault
bash-5.1# 
bash-5.1# python3 
Python 3.10.7 (main, Nov 24 2022, 13:02:43) [GCC 11.2.1 20220219] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from cryptography.hazmat.primitives.asymmetric import ec, padding
Segmentation fault
bash-5.1# 
bash-5.1# gdb --args python -c "import sys, numpy; print(numpy.__version__, sys.version)"
GNU gdb (GDB) 11.2
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "armv7-alpine-linux-musleabihf".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from python...
(No debugging symbols found in python)
(gdb) r
Starting program: /usr/local/bin/python -c import\ sys,\ numpy\;\ print\(numpy.__version__,\ sys.version\)

Program received signal SIGSEGV, Segmentation fault.
0x75fb792c in ?? () from /lib/ld-musl-armhf.so.1

apk --print-arch' gives armv7, but /lib/libc.musl-armv7.so.1 is link to /lib/ld-musl-armhf.so.1 it is somewhat strange

magicse avatar Feb 03 '23 22:02 magicse

https://github.com/home-assistant/core/issues/75142#issuecomment-1200269467 Just for completeness, on 2022.6.7 both imports, numpy and jwt, work without issues:

magicse avatar Feb 04 '23 09:02 magicse

https://github.com/home-assistant/core/issues/75142#issuecomment-1287606304

magicse avatar Feb 04 '23 09:02 magicse

@Wetzel402

This issue might be resolved finally. If the latest official container still isn't working you can try linuxserver.io's container.

linuxserver.io's container Home Assistant 2023.2.1 work well.

magicse avatar Feb 04 '23 11:02 magicse

I was hoping that @Gerigot's fix would also correct the issue with QNAP but it appears that isn't the case. More investigation and research is needed...

Wetzel402 avatar Feb 06 '23 16:02 Wetzel402

@Wetzel402 No problem any needed logs and tests for analytics we could make.

magicse avatar Feb 06 '23 21:02 magicse

@magicse, right now I think comparing the repositories to find differences would be a good start. Why does linuxserver.io's image run when the official does not?

Wetzel402 avatar Feb 07 '23 17:02 Wetzel402

I'm already noticing that linuxserver.io has a dedicated dockerfile for armhf.

Edit: They also use their own wheels. This makes me suspect it is a wheels issue as some have previously suspected...

One more edit... Linuxserver is using their own Alpine 3.16 base image.

Wetzel402 avatar Feb 07 '23 17:02 Wetzel402

It could be worth trying to build the docker image using the Linuxserver base image with offical HA wheels as well as vice versa.

Wetzel402 avatar Feb 07 '23 18:02 Wetzel402

It is really a strange behavior and unfortunately I don't have an ARMv7 to make some test myself it will be really difficult to find out what's going on

Gerigot avatar Feb 07 '23 18:02 Gerigot

@Wetzel402

Edit: They also use their own wheels. This makes me suspect it is a wheels issue as some have previously suspected...

May be you are right and they use different wheels for example for cryptography. I see by names that they didn't use musllinux for arch armv7l . musllinux used for aarch64 and x86_64 arch. Also aarch64 and x86_64 arch builded for Python 3.6, and armv7l armv8l builded for Python 3.10

[cryptography-39.0.0-cp310-cp310-linux_armv7l.whl](https://wheels.linuxserver.io/alpine-3.16/cryptography-39.0.0-cp310-cp310-linux_armv7l.whl)
[cryptography-39.0.0-cp310-cp310-linux_armv8l.whl](https://wheels.linuxserver.io/alpine-3.16/cryptography-39.0.0-cp310-cp310-linux_armv8l.whl)
[cryptography-39.0.0-cp36-abi3-musllinux_1_1_aarch64.whl](https://wheels.linuxserver.io/alpine-3.16/cryptography-39.0.0-cp36-abi3-musllinux_1_1_aarch64.whl)
[cryptography-39.0.0-cp36-abi3-musllinux_1_1_x86_64.whl](https://wheels.linuxserver.io/alpine-3.16/cryptography-39.0.0-cp36-abi3-musllinux_1_1_x86_64.whl)

Debugging of official image give me error in ld-musl-armhf.so.1 while importing (for exmaple) numpy or cryptography

(gdb) r
bash-5.1# gdb --args python -c "import sys, numpy; print(numpy.__version__, sys.version)"
Starting program: /usr/local/bin/python -c import\ sys,\ numpy\;\ print\(numpy.__version__,\ sys.version\)

Program received signal SIGSEGV, Segmentation fault.
0x75fb792c in ?? () from /lib/ld-musl-armhf.so.1

I think we could start from this point.... For example when simple importing of numpy inside of container will be without segmentation fault.

**bash-5.1# python3 
Python 3.10.7 (main, Nov 24 2022, 13:02:43) [GCC 11.2.1 20220219] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy
Segmentation fault
bash-5.1#** 

magicse avatar Feb 08 '23 03:02 magicse

QNAP armv7l May be problem with armv7l <-- > armv7hf and package architecture (armv7l) does not match system (armhf) also it not armv7hf-musl armv7hf armv7l-musl

bash-5.1# cat /proc/cpuinfo processor : 0 model name : Annapurna Labs Alpine AL314 Quad-core ARM Cortex-A15 CPU @ 1.70GHz Speed : 1.7GHz Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm CPU implementer : 0x41 CPU architecture: 7 CPU variant : 0x2 CPU part : 0xc0f CPU revision : 4

magicse avatar Feb 08 '23 04:02 magicse

@Gerigot @Wetzel402 I have tried build wheels for armv7 on Ubuntu 64 with next command

export ARCH=armv7
sudo docker build  --build-arg CPYTHON_ABI=cp310 --build-arg BUILD_FROM=ghcr.io/home-assistant/wheels/${ARCH}/musllinux_1_2/cp310:dev --build-arg BUILD_ARCH=${ARCH} --tag wheel-builder:${ARCH} .

And all time got error about import cmake.

After that I checked Dockerfile and requirements_cp310.txt and files at this link "https://wheels.home-assistant.io/musllinux/" and there is different versions of cmake. In requirements_cp310.txt version of cmake 3.25.2 and at link "https://wheels.home-assistant.io/musllinux/" cmake for armv7 is 3.22.2. After changing version of cmake in file requirements_cp310.txt to 3.22.2 - all builds well.

Could you check this problem with versions?

magicse avatar Feb 15 '23 13:02 magicse

@magicse,

The cmake version was only changed a couple of weeks ago so I'm not sure that's our problem. Theoretically something changed between the June and July builds of HA and we need to find it.

Wetzel402 avatar Feb 15 '23 20:02 Wetzel402

I think need dig in musllinux becouse i get seg fault during debug from ld-musl-armhf.so.1

gdb --args python -c "import sys, numpy; print(numpy.__version__, sys.version)"
Starting program: /usr/local/bin/python -c import\ sys,\ numpy\;\ print\(numpy.__version__,\ sys.version\)
Program received signal SIGSEGV, Segmentation fault.
0x75fb792c in ?? () from /lib/ld-musl-armhf.so.1

Also additional information abut cpu from here ARMonQEMUforDebianUbuntu.md ARMv7 CPU which Debian calls as armhf (ARM hard float) and Cortex-A8, A9, A15 are all ARMv7. ARMv6 CPU which Debian calls as armel and ARMv5, v6. Raspberry Pi uses ARMv6. In this case, the cpu is arm1176

And addidtiona information ARM v6 and v7 target musl Building a cross compiler targeting musl libc MUSL LIBC for ARMv6 and ARMv7 Building Unable run grpcio wheel on alpine arvm7 run grpcio wheel on alpine arvm7

magicse avatar Feb 16 '23 05:02 magicse

I rebuilt numpy package inside of homeassistant container

git clone https://github.com/numpy/numpy.git
cd numpy
python setup.py build -j 4 install

And after that it work well without segfault.

bash-5.1# python
Python 3.10.7 (main, Nov 24 2022, 13:02:43) [GCC 11.2.1 20220219] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy
>>> 

magicse avatar Feb 16 '23 10:02 magicse

$\color{magenta}{\textrm{Test experiments inside of homeassistant container}}$

💡 pip show numpy

bash-5.1# pip show numpy
Name: numpy
Version: 1.23.2
Summary: NumPy is the fundamental package for array computing with Python.
Home-page: https://www.numpy.org
Author: Travis E. Oliphant et al.
Author-email: 
License: BSD
Location: /usr/local/lib/python3.10/site-packages
Requires: 
Required-by: contourpy, imageio, matplotlib, noaa-coops, pandas, pyairvisual, PyTurboJPEG

💡 python3 -c "import numpy"

bash-5.1# python3 -c "import numpy"
Segmentation fault 
bash-5.1# 

And after pip install numpy all good

💡 pip install numpy --upgrade --no-cache-dir --force-reinstall --use-deprecated=legacy-resolver

bash-5.1#  apk add gcc g++
bash-5.1#  pip install numpy --upgrade --no-cache-dir --force-reinstall  --use-deprecated=legacy-resolver
Collecting numpy
  Downloading numpy-1.24.2.tar.gz (10.9 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 10.9/10.9 MB 919.0 kB/s eta 0:00:00
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Building wheels for collected packages: numpy
  Building wheel for numpy (pyproject.toml) ... done
  Created wheel for numpy: filename=numpy-1.24.2-cp310-cp310-linux_armv7l.whl size=6837303 sha256=45539a718b34234f92ede59b7f65d6b363f0c0d85cc8d862b70d190527d633ae
  Stored in directory: /tmp/pip-ephem-wheel-cache-8pwduyiw/wheels/31/42/8e/88540c3411ed4734c7fd06056942e82136135724593ecec35a
Successfully built numpy
Installing collected packages: numpy
  Attempting uninstall: numpy
    Found existing installation: numpy 1.23.2
    Uninstalling numpy-1.23.2:
      Successfully uninstalled numpy-1.23.2
Successfully installed numpy-1.24.2

💡 python3 -c "import numpy"

bash-5.1# python3 -c "import numpy"
bash-5.1# 

💡 pip show numpy

bash-5.1# pip show numpy
Name: numpy
Version: 1.24.2
Summary: Fundamental package for array computing in Python
Home-page: https://www.numpy.org
Author: Travis E. Oliphant et al.
Author-email: 
License: BSD-3-Clause
Location: /usr/local/lib/python3.10/site-packages
Requires: 
Required-by: contourpy, imageio, matplotlib, noaa-coops, pandas, pyairvisual, PyTurboJPEG

magicse avatar Feb 18 '23 11:02 magicse