rpicam-apps icon indicating copy to clipboard operation
rpicam-apps copied to clipboard

Rpi 4 8gb + HQ Cam + Thomas Jacquin Allsky + Arducam Ethernet Extension

Open MrNbaphoto opened this issue 8 months ago • 81 comments

Hi everyone,

I am having an issue where images are hanging, when stopping Allsky then Rebooting it seems I get a small handful of photos then it stops. I thought I was having an issue with RJ45 so I replaced it and no change. I then formatted the Micro SD and did a fresh install with the newest version of PI and also TJ Allsky. Same issue.. Done a fair bit of googling and tried a few things but to no avail...

I have had some discussions on the Allsky Git Page https://github.com/AllskyTeam/allsky/discussions/4409

We have exhausted options and I was suggested to report it here in hope of some help.

I am very basic when it comes to Linux so if you can be a little gentle and explain how to do things that would be great. I am logging into Rpi via SSH until I get a HDMI to Micro HDMI to connect it to a screen

Rpi 4 8gb, Rpi HQ Cam, TJ Allsky Current Version, Arducam Ethernet extension, Fresh Formatted 128gb Card, Fresh Install 64bit PI with Desktop

nbaphoto@nbaphoto:~ $ strace -p 71361 strace: Process 71361 attached futex(0x7fe6f406fc, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, FUTEX_BITSET_MATCH_ANY^Cstrace: Process 71361 detached <detached ...>

Thanks in advance. Micheal

MrNbaphoto avatar Apr 22 '25 02:04 MrNbaphoto

From https://github.com/AllskyTeam/allsky/discussions/4409:

[93:12:50.409301353] [47239] INFO Camera camera.cpp:1008 Pipeline handler in use by another process
ERROR: *** failed to acquire camera /base/soc/i2c0mux/i2c@1/imx477@1a ***

This suggests (as mentioned in the above discussion) that one of the earlier rpicam-still processes has stalled. Can you run rpicam-hello --version and provide the output please? Also, can you confirm the full command line of the process that has stalled?

naushir avatar Apr 22 '25 08:04 naushir

rpicam-hello [6:37:52.560198419] [6074] INFO Camera camera_manager.cpp:327 libcamera v0.4.0+53-29156679 [6:37:52.601738308] [6080] WARN RPiSdn sdn.cpp:40 Using legacy SDN tuning - please consider moving SDN inside rpi.denoise [6:37:52.604166024] [6080] INFO RPI vc4.cpp:447 Registered camera /base/soc/i2c0mux/i2c@1/imx477@1a to Unicam device /dev/media0 and ISP device /dev/media1 [6:37:52.604237746] [6080] INFO RPI pipeline_base.cpp:1121 Using configuration file '/usr/share/libcamera/pipeline/rpi/vc4/rpi_apps.yaml' Preview window unavailable [6:37:52.605105589] [6074] INFO Camera camera.cpp:1008 Pipeline handler in use by another process ERROR: *** failed to acquire camera /base/soc/i2c0mux/i2c@1/imx477@1a ***

rpicam-hello --version rpicam-apps build: v1.6.0 025ca84648c9 03-02-2025 (16:21:04) rpicam-apps capabilites: egl:1 qt:1 drm:1 libav:1 libcamera build: v0.4.0+53-29156679

MrNbaphoto avatar Apr 22 '25 09:04 MrNbaphoto

Have you tested without the Arducam ethernet (really CAT5) extension?

I'm assuming it is https://docs.arducam.com/Camera-Extension-Solution/Ethernet-Extension-Kit/, in which case AIUI it is an active device which internally requires configuring for the CSI2 configuration. Arducam have prebaked some settings into it, but no-one else knows what those settings are. I can't even see a reference as to what chipset they are using.

6by9 avatar Apr 22 '25 10:04 6by9

Was running Snagless Cat6 cable FYI.

That is a negative, though my previous iteration ran POE on that version of TJ Allsky, ran fine to the camera. This iteration ran for over a year with no problems at all till more recently.

You are assuming correctly that is what I am using.

MrNbaphoto avatar Apr 22 '25 10:04 MrNbaphoto

As @6by9 mentioned, my first suggestion would be to remove the Adrucam extention hardware from your setup and see if you still observe the lockup.

naushir avatar Apr 22 '25 13:04 naushir

@MrNbaphoto, Michael, To provide @naushir with the command being run, do this:

grep "Running:" /var/log/allsky.log | tail -1

That will display the last rpicam-still command. Make sure you copy the command and all the arguments after it.

@naushir, Most of the rpicam-still command lines from users have the same arguments in the same order, but obviously with different values. Users select settings like auto/manual exposure via a web interface, and Allsky creates and executes the command. The only "free form" field, which isn't used very often, is the "Extra Parameters" setting which is primarily for uncommon rpicam-still settings Allsky doesn't know about, like the focus-related ones.

6:37:52.605105589] [6074] INFO Camera camera.cpp:1008 Pipeline handler in use by another process

Unless there's a bug in Allsky, it's very unlikely there will ever be multiple rpicam-still command running at once. Before every image, the Allsky C program runs

system("pkill --signal SIGKILL rpicam-still");

EricClaeys avatar Apr 22 '25 23:04 EricClaeys

@EricClaeys I tried running that and it doesn't come up with anything at all?

grep "Running:" /var/log/allsky.log | tail -1

MrNbaphoto avatar Apr 23 '25 04:04 MrNbaphoto

@MrNbaphoto, sorry about that. Temporarily change the Debug Level in the WebUI to 2, wait a couple minutes and run the command again. The "Running" command is only logged at level 2 or higher.

EricClaeys avatar Apr 23 '25 05:04 EricClaeys

Thanks 👍

nbaphoto@nbaphoto:~ $ grep "Running:" /var/log/allsky.log | tail -1 2025-04-23T15:16:35.629431+10:00 nbaphoto allsky[19755]: > Running: rpicam-still --thumb none --output '/home/nbaphoto/allsky/tmp/image-20250423151635.jpg' --timeout 1 --nopreview --width 4056 --height 3040 --shutter 3982 --analoggain 1 --awb auto --quality 100

MrNbaphoto avatar Apr 23 '25 05:04 MrNbaphoto

@EricClaeys something interesting for you. Stopped and restarted Allsky, still had debug level 2. Got this message: Can't determine what command to use for RPi camera. (April 23, 4:08:12 pm AEST)

Image actually says RPI camera command not found Image

MrNbaphoto avatar Apr 23 '25 06:04 MrNbaphoto

When Allsky starts it runs a couple different commands to try and find the camera, but none of them worked. It's most likely due to whatever is causing rpicam-still to hang.

EricClaeys avatar Apr 23 '25 06:04 EricClaeys

The command line all looks reasonable to me.

My (very uneducated) suspicion is that the earlier rpicam-still process may be stalled because of a sensor device timeout. This causes subsequent rpicam-still processes to fail in this way. Is there any way of gathering logs for all runs so we can check this? A timeout error message will be displayed if this happens.

naushir avatar Apr 23 '25 09:04 naushir

@naushir, the Allsky log file records every rpicam-still command it runs, but we only keep stdout/stderr from rpicam-still from the last run. That's where this output:

[93:12:50.409301353] [47239] INFO Camera camera.cpp:1008 Pipeline handler in use by another process
ERROR: *** failed to acquire camera /base/soc/i2c0mux/i2c@1/imx477@1a ***

came from. What other information are you referring to with "Is there any way of gathering logs for all runs" ? I can probably give the users having this problem a modified Allsky that did more/different logging.

If it makes any difference, Allsky actually calls rpicam-still with LIBCAMERA_LOG_LEVELS=ERROR,FATAL prepended to the command line.

Shouldn't running

system("pkill --signal SIGKILL rpicam-still");

kill any prior hung raspi-still command? SIGKILL should kill anything, although I believe it's possible the process can be killed but because no other process did a "wait()" on it, it becomes a zombie process that may still keep resources open.

EricClaeys avatar Apr 25 '25 20:04 EricClaeys

I've had the same issue, attached is the trace file for the process after it hung. Here's the version info:

Image

rpicam_debug_trace_479584.txt

brianboru82 avatar Apr 26 '25 13:04 brianboru82

@MrNbaphoto and @brianboru82 did you have a chance to run this with the sensor attached directly to the Raspberry Pi and without the Adrucam Ethernet extender?

naushir avatar Apr 28 '25 07:04 naushir

@naushir my unit is sealed on the roof, hence the extension kit. Don't have a spare camera to test with the hat off

MrNbaphoto avatar Apr 28 '25 08:04 MrNbaphoto

We definitely want to check if the preceding process has had a HW timeout event occurring to cause this. With the absence of any other information, I believe this is what's happening.

When a timeout occurs, we attempt to restart the device, but this may not be successful, causing the process to lockup in an undetermined state. The below change will instead just quit the process with an exception. Would you be able to try this out and see if it unblocks the next process from running correctly?

diff --git a/apps/rpicam_still.cpp b/apps/rpicam_still.cpp
index 5e97aa79fbde..ba4797bfd5ab 100644
--- a/apps/rpicam_still.cpp
+++ b/apps/rpicam_still.cpp
@@ -222,10 +222,7 @@ static void event_loop(RPiCamStillApp &app)
                RPiCamApp::Msg msg = app.Wait();
                if (msg.type == RPiCamApp::MsgType::Timeout)
                {
-                       LOG_ERROR("ERROR: Device timeout detected, attempting a restart!!!");
-                       app.StopCamera();
-                       app.StartCamera();
-                       continue;
+                       throw std::runtime_error("ERROR: Device timeout detected, quitting!!!");
                }
                if (msg.type == RPiCamApp::MsgType::Quit)
                        return;

naushir avatar Apr 28 '25 08:04 naushir

Shouldn't running

system("pkill --signal SIGKILL rpicam-still");

kill any prior hung raspi-still command? SIGKILL should kill anything, although I believe it's possible the process can be killed but because no other process did a "wait()" on it, it becomes a zombie process that may still keep resources open.

SIGKILL should indeed kill it completely AFAIK. Perhaps it's worth adding a sleep call to wait a number of seconds before starting the new rpicam-still process as the kill may take a few seconds?

naushir avatar Apr 28 '25 08:04 naushir

@naushir, How would someone get your modified program?

Where was the LOG_ERROR("ERROR... sent? That message never appeared in the rpicam-still output.

EricClaeys avatar Apr 28 '25 13:04 EricClaeys

@naushir, How would someone get your modified program?

Right now the users would have to manually apply the change to rpicam-vid.cpp and rebuild/install locally. If this is not possible, I can provide a pre-built binary with this change.

Where was the LOG_ERROR("ERROR... sent? That message never appeared in the rpicam-still output.

The exception should be logged to stderr if I'm not mistaken.

naushir avatar Apr 28 '25 14:04 naushir

@naushir ,

The exception should be logged to stderr if I'm not mistaken.

That error message didn't appear in either user's output, so I assume their hang is caused by something else.

EricClaeys avatar Apr 28 '25 14:04 EricClaeys

Do you capture both stdout and stderr as I'm really guessing where it logs to.

naushir avatar Apr 28 '25 14:04 naushir

@MrNbaphoto and @brianboru82 did you have a chance to run this with the sensor attached directly to the Raspberry Pi and without the Adrucam Ethernet extender?

My setup has the HQ camera directly connected to the Raspberry Pi with the ribbon cable, not using any extender.

brianboru82 avatar Apr 28 '25 15:04 brianboru82

@naushir, How would someone get your modified program?

Right now the users would have to manually apply the change to rpicam-vid.cpp and rebuild/install locally. If this is not possible, I can provide a pre-built binary with this change.

Where was the LOG_ERROR("ERROR... sent? That message never appeared in the rpicam-still output.

The exception should be logged to stderr if I'm not mistaken.

I could probably try this in the next couple days. Is everything I need to do in the command line in that code segment above? If it's not too much trouble, having a pre-built binary might be easier.

brianboru82 avatar Apr 28 '25 16:04 brianboru82

Do you capture both stdout and stderr as I'm really guessing where it logs to.

Yes, we capture stdout and stderr.

EricClaeys avatar Apr 28 '25 17:04 EricClaeys

Okay, I built the rpicam-apps with the changes mentioned above. I'll let you know how it goes. The process hangs have been random, sometimes once a day, sometimes a week between.

brianboru82 avatar Apr 29 '25 04:04 brianboru82

@brianboru82 do you have any updates on your testing?

naushir avatar May 01 '25 06:05 naushir

It has been working fine since the change. I'll let you know if something happens. Let me know if there's any log or something that would be helpful.

brianboru82 avatar May 01 '25 14:05 brianboru82

@naushir, @brianboru82, No one's ever reported the ERROR: Device timeout detected, attempting a restart!!! message which implies that's never been the problem, so I'm not sure if the change will help.

@brianboru82, Any output from the rpicam-still command is written to ~/allsky/tmp/capture_RPi_debug.txt, so if there's a problem that's a good file to look in. Note that the file only has the LAST command's output.

EricClaeys avatar May 02 '25 06:05 EricClaeys

@EricClaeys do you record logs for every rpicam-still invocation in ~/allsky/tmp/capture_RPi_debug.txt? We are looking for a timeout error in the last run invocation I think.

naushir avatar May 02 '25 06:05 naushir