teddycloud icon indicating copy to clipboard operation
teddycloud copied to clipboard

TeddyCloud Service Fails to Start Sporadically After System Reboot

Open XerXes777 opened this issue 1 year ago • 11 comments

Describe the bug After a system reboot, the TeddyCloud service does not start correctly sporadically. While the LXC container and the Docker container appear to be running, TeddyCloud is inaccessible via the browser. Additionally, the CPU load in Proxmox increases to ~20%. A manual restart of the LXC container resolves the issue.

This issue was only noticed because the system needs to restart regularly due to the scheduled backup process.

To Reproduce Steps to reproduce the behavior:

1. Reboot the Proxmox host or restart the LXC container.
2. The LXC container starts, and Docker reports that TeddyCloud is running.
3. Attempt to access TeddyCloud via the browser.
4. TeddyCloud is inaccessible.
5. Observe that CPU load in Proxmox increases to ~20%.
6. Restart the LXC container manually.
7. TeddyCloud becomes accessible again.

Expected behavior TeddyCloud should start correctly and be accessible after a system reboot or container restart, without requiring manual intervention.

Screenshots N/A

Technical Details:

  • TeddyCloud Version: v0.6.3
  • Proxmox PVE Version: 8.3.3
  • Proxmox Backup Server (PBS) Version: 3.3.2
  • LXC Container: Docker LXC (Installed via HelperScript)
  • Docker Version: 27.4.1
  • Docker image used: ghcr.io/toniebox-reverse-engineering/teddycloud:lates

Attach logs of teddyCloud

_teddycloud_logs (8).txt

Additional context

  • The issue is not caused by the Proxmox backup process itself.
  • The backup is scheduled for 08:00 AM, but logs show a 1-hour offset (logged at 07:00 AM).
  • The problem occurs sporadically after a system reboot.
  • The issue was only noticed because the system needs to restart regularly due to the scheduled backup process.

XerXes777 avatar Feb 17 '25 09:02 XerXes777

This also happens once in a while on my test system. The reason is unknown, as teddyCloud doesn't start at all, there is no log output in that case.

SciLor avatar Feb 17 '25 10:02 SciLor

Since this condition occurs every 2-5 days, I would like to narrow down the error further. I suspect that the teddycloud service is the problem, because Proxmox itself and the Docker LXC start up completely. Portainer in Docker LXC is also accessible. The teddycloud Docker container is also running according to Portainer.

As soon as I log into the shell of the container and start teddycloud manually, the service runs normally. This means that only teddycloud does not start automatically. Therefore, I don't see any indication of a proxmox problem.

I need help to narrow down the error further. What debugging options are available in the teddycloud docker container?

XerXes777 avatar Feb 27 '25 08:02 XerXes777

As there is no log output of teddycloud, it could only happen here:

https://github.com/toniebox-reverse-engineering/teddycloud/blob/93b4ecd6a0c546d27c1ba487ede9aa01a9038274/src/main.c#L235

int_t main(int argc, char *argv[])
{
    char cwd[PATH_LEN] = {0};
    error_text_init();

    get_settings()->log.level = TRACE_LEVEL_WARNING;

    TRACE_PRINTF(BUILD_FULL_NAME_LONG "\r\n\r\n");

error_text_init is just filling an array with data, get_settings also just writes data to existing structures.

On teddycloud, there is nothing which could help debugging directly. It may be possible to add a DEBUG ENV variable and enable strace or something similar to see what the system does.

SciLor avatar Feb 27 '25 08:02 SciLor

I have changed the docker-entrypoint.sh from teddycloud docker container. Maybe I can get more information about this?

#!/bin/bash

set -o errexit
set -o pipefail
set -o nounset
set -o xtrace   # Enable command tracing for debugging

mkdir -p /teddycloud/certs/server /teddycloud/certs/client

if [ -n "${DOCKER_TEST:-}" ]; then
  cd /teddycloud
  LSAN_OPTIONS=detect_leaks=0 teddycloud --docker-test
else
  while true
  do
    cd /teddycloud
    echo "Starting teddycloud at $(date)"
    teddycloud
    retVal=$?
    echo "teddycloud exited with code $retVal at $(date)"
    # Optional: Add a short delay before restarting
    sleep 5
    if [ $retVal -ne -2 ]; then
        exit $retVal
    fi
  done
fi

XerXes777 avatar Feb 27 '25 08:02 XerXes777

I have added it to the latest develop

Additionally environment: STRACE: 1

enables strace

it is also possible to attach strace to a running process, if this happens next time with this build. (without the env variable set)

SciLor avatar Feb 27 '25 09:02 SciLor

I got these logs info, not more (v0.6.3):

2025-03-17T07:01:11.510674000Z + mkdir -p /teddycloud/certs/server /teddycloud/certs/client

2025-03-17T07:01:11.512045000Z + '[' -n '' ']'

2025-03-17T07:01:11.512158000Z + true

2025-03-17T07:01:11.512223000Z + cd /teddycloud

2025-03-17T07:01:11.512390000Z ++ date

2025-03-17T07:01:11.513250000Z Starting teddycloud at Mon Mar 17 07:01:11 UTC 2025

2025-03-17T07:01:11.513326000Z + echo 'Starting teddycloud at Mon Mar 17 07:01:11 UTC 2025'

2025-03-17T07:01:11.513404000Z + teddycloud

XerXes777 avatar Mar 17 '25 11:03 XerXes777

I got these logs info, not more (v0.6.3)

Can you also enable strace?

I also get this error since v0.6.3 after rebooting, but on a usual archlinux with docker. Also with v0.6.4 the error still occurs. There are no logs and the CPU runs with almost 100% in one thread.

Strubbl avatar Mar 21 '25 20:03 Strubbl

I got these logs info, not more (v0.6.3)

Can you also enable strace?

How?

XerXes777 avatar Mar 22 '25 12:03 XerXes777

You need to install strace in the container and then adjust your teddycloud call in your script to LSAN_OPTIONS=detect_leaks=0 strace teddycloud --docker-test

strace generates a lot of output. During normal run of teddycloud this might impact the performance a lot

Strubbl avatar Mar 25 '25 20:03 Strubbl

You don't need to install a trace, it is already available since 0.6.4 within the container. You could also enable strace with the strace: 1 ENV variable.

SciLor avatar Mar 26 '25 05:03 SciLor

JFYI: I have not experienced this issue in a while.

Strubbl avatar Nov 02 '25 21:11 Strubbl