Rpi 4 8gb + HQ Cam + Thomas Jacquin Allsky + Arducam Ethernet Extension
Hi everyone,
I am having an issue where images are hanging, when stopping Allsky then Rebooting it seems I get a small handful of photos then it stops. I thought I was having an issue with RJ45 so I replaced it and no change. I then formatted the Micro SD and did a fresh install with the newest version of PI and also TJ Allsky. Same issue.. Done a fair bit of googling and tried a few things but to no avail...
I have had some discussions on the Allsky Git Page https://github.com/AllskyTeam/allsky/discussions/4409
We have exhausted options and I was suggested to report it here in hope of some help.
I am very basic when it comes to Linux so if you can be a little gentle and explain how to do things that would be great. I am logging into Rpi via SSH until I get a HDMI to Micro HDMI to connect it to a screen
Rpi 4 8gb, Rpi HQ Cam, TJ Allsky Current Version, Arducam Ethernet extension, Fresh Formatted 128gb Card, Fresh Install 64bit PI with Desktop
nbaphoto@nbaphoto:~ $ strace -p 71361 strace: Process 71361 attached futex(0x7fe6f406fc, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, FUTEX_BITSET_MATCH_ANY^Cstrace: Process 71361 detached <detached ...>
Thanks in advance. Micheal
From https://github.com/AllskyTeam/allsky/discussions/4409:
[93:12:50.409301353] [47239] INFO Camera camera.cpp:1008 Pipeline handler in use by another process
ERROR: *** failed to acquire camera /base/soc/i2c0mux/i2c@1/imx477@1a ***
This suggests (as mentioned in the above discussion) that one of the earlier rpicam-still processes has stalled. Can you run rpicam-hello --version and provide the output please? Also, can you confirm the full command line of the process that has stalled?
rpicam-hello [6:37:52.560198419] [6074] INFO Camera camera_manager.cpp:327 libcamera v0.4.0+53-29156679 [6:37:52.601738308] [6080] WARN RPiSdn sdn.cpp:40 Using legacy SDN tuning - please consider moving SDN inside rpi.denoise [6:37:52.604166024] [6080] INFO RPI vc4.cpp:447 Registered camera /base/soc/i2c0mux/i2c@1/imx477@1a to Unicam device /dev/media0 and ISP device /dev/media1 [6:37:52.604237746] [6080] INFO RPI pipeline_base.cpp:1121 Using configuration file '/usr/share/libcamera/pipeline/rpi/vc4/rpi_apps.yaml' Preview window unavailable [6:37:52.605105589] [6074] INFO Camera camera.cpp:1008 Pipeline handler in use by another process ERROR: *** failed to acquire camera /base/soc/i2c0mux/i2c@1/imx477@1a ***
rpicam-hello --version rpicam-apps build: v1.6.0 025ca84648c9 03-02-2025 (16:21:04) rpicam-apps capabilites: egl:1 qt:1 drm:1 libav:1 libcamera build: v0.4.0+53-29156679
Have you tested without the Arducam ethernet (really CAT5) extension?
I'm assuming it is https://docs.arducam.com/Camera-Extension-Solution/Ethernet-Extension-Kit/, in which case AIUI it is an active device which internally requires configuring for the CSI2 configuration. Arducam have prebaked some settings into it, but no-one else knows what those settings are. I can't even see a reference as to what chipset they are using.
Was running Snagless Cat6 cable FYI.
That is a negative, though my previous iteration ran POE on that version of TJ Allsky, ran fine to the camera. This iteration ran for over a year with no problems at all till more recently.
You are assuming correctly that is what I am using.
As @6by9 mentioned, my first suggestion would be to remove the Adrucam extention hardware from your setup and see if you still observe the lockup.
@MrNbaphoto, Michael, To provide @naushir with the command being run, do this:
grep "Running:" /var/log/allsky.log | tail -1
That will display the last rpicam-still command. Make sure you copy the command and all the arguments after it.
@naushir, Most of the rpicam-still command lines from users have the same arguments in the same order, but obviously with different values. Users select settings like auto/manual exposure via a web interface, and Allsky creates and executes the command. The only "free form" field, which isn't used very often, is the "Extra Parameters" setting which is primarily for uncommon rpicam-still settings Allsky doesn't know about, like the focus-related ones.
6:37:52.605105589] [6074] INFO Camera camera.cpp:1008 Pipeline handler in use by another process
Unless there's a bug in Allsky, it's very unlikely there will ever be multiple rpicam-still command running at once. Before every image, the Allsky C program runs
system("pkill --signal SIGKILL rpicam-still");
@EricClaeys I tried running that and it doesn't come up with anything at all?
grep "Running:" /var/log/allsky.log | tail -1
@MrNbaphoto, sorry about that.
Temporarily change the Debug Level in the WebUI to 2, wait a couple minutes and run the command again. The "Running" command is only logged at level 2 or higher.
Thanks 👍
nbaphoto@nbaphoto:~ $ grep "Running:" /var/log/allsky.log | tail -1 2025-04-23T15:16:35.629431+10:00 nbaphoto allsky[19755]: > Running: rpicam-still --thumb none --output '/home/nbaphoto/allsky/tmp/image-20250423151635.jpg' --timeout 1 --nopreview --width 4056 --height 3040 --shutter 3982 --analoggain 1 --awb auto --quality 100
@EricClaeys something interesting for you. Stopped and restarted Allsky, still had debug level 2. Got this message: Can't determine what command to use for RPi camera. (April 23, 4:08:12 pm AEST)
Image actually says RPI camera command not found
When Allsky starts it runs a couple different commands to try and find the camera, but none of them worked. It's most likely due to whatever is causing rpicam-still to hang.
The command line all looks reasonable to me.
My (very uneducated) suspicion is that the earlier rpicam-still process may be stalled because of a sensor device timeout. This causes subsequent rpicam-still processes to fail in this way. Is there any way of gathering logs for all runs so we can check this? A timeout error message will be displayed if this happens.
@naushir, the Allsky log file records every rpicam-still command it runs, but we only keep stdout/stderr from rpicam-still from the last run. That's where this output:
[93:12:50.409301353] [47239] INFO Camera camera.cpp:1008 Pipeline handler in use by another process
ERROR: *** failed to acquire camera /base/soc/i2c0mux/i2c@1/imx477@1a ***
came from. What other information are you referring to with "Is there any way of gathering logs for all runs" ? I can probably give the users having this problem a modified Allsky that did more/different logging.
If it makes any difference, Allsky actually calls rpicam-still with LIBCAMERA_LOG_LEVELS=ERROR,FATAL prepended to the command line.
Shouldn't running
system("pkill --signal SIGKILL rpicam-still");
kill any prior hung raspi-still command? SIGKILL should kill anything, although I believe it's possible the process can be killed but because no other process did a "wait()" on it, it becomes a zombie process that may still keep resources open.
I've had the same issue, attached is the trace file for the process after it hung. Here's the version info:
@MrNbaphoto and @brianboru82 did you have a chance to run this with the sensor attached directly to the Raspberry Pi and without the Adrucam Ethernet extender?
@naushir my unit is sealed on the roof, hence the extension kit. Don't have a spare camera to test with the hat off
We definitely want to check if the preceding process has had a HW timeout event occurring to cause this. With the absence of any other information, I believe this is what's happening.
When a timeout occurs, we attempt to restart the device, but this may not be successful, causing the process to lockup in an undetermined state. The below change will instead just quit the process with an exception. Would you be able to try this out and see if it unblocks the next process from running correctly?
diff --git a/apps/rpicam_still.cpp b/apps/rpicam_still.cpp
index 5e97aa79fbde..ba4797bfd5ab 100644
--- a/apps/rpicam_still.cpp
+++ b/apps/rpicam_still.cpp
@@ -222,10 +222,7 @@ static void event_loop(RPiCamStillApp &app)
RPiCamApp::Msg msg = app.Wait();
if (msg.type == RPiCamApp::MsgType::Timeout)
{
- LOG_ERROR("ERROR: Device timeout detected, attempting a restart!!!");
- app.StopCamera();
- app.StartCamera();
- continue;
+ throw std::runtime_error("ERROR: Device timeout detected, quitting!!!");
}
if (msg.type == RPiCamApp::MsgType::Quit)
return;
Shouldn't running
system("pkill --signal SIGKILL rpicam-still");kill any prior hung
raspi-stillcommand? SIGKILL should kill anything, although I believe it's possible the process can be killed but because no other process did a "wait()" on it, it becomes a zombie process that may still keep resources open.
SIGKILL should indeed kill it completely AFAIK. Perhaps it's worth adding a sleep call to wait a number of seconds before starting the new rpicam-still process as the kill may take a few seconds?
@naushir, How would someone get your modified program?
Where was the LOG_ERROR("ERROR... sent? That message never appeared in the rpicam-still output.
@naushir, How would someone get your modified program?
Right now the users would have to manually apply the change to rpicam-vid.cpp and rebuild/install locally. If this is not possible, I can provide a pre-built binary with this change.
Where was the
LOG_ERROR("ERROR...sent? That message never appeared in the rpicam-still output.
The exception should be logged to stderr if I'm not mistaken.
@naushir ,
The exception should be logged to stderr if I'm not mistaken.
That error message didn't appear in either user's output, so I assume their hang is caused by something else.
Do you capture both stdout and stderr as I'm really guessing where it logs to.
@MrNbaphoto and @brianboru82 did you have a chance to run this with the sensor attached directly to the Raspberry Pi and without the Adrucam Ethernet extender?
My setup has the HQ camera directly connected to the Raspberry Pi with the ribbon cable, not using any extender.
@naushir, How would someone get your modified program?
Right now the users would have to manually apply the change to
rpicam-vid.cppand rebuild/install locally. If this is not possible, I can provide a pre-built binary with this change.Where was the
LOG_ERROR("ERROR...sent? That message never appeared in the rpicam-still output.The exception should be logged to
stderrif I'm not mistaken.
I could probably try this in the next couple days. Is everything I need to do in the command line in that code segment above? If it's not too much trouble, having a pre-built binary might be easier.
Do you capture both
stdoutandstderras I'm really guessing where it logs to.
Yes, we capture stdout and stderr.
Okay, I built the rpicam-apps with the changes mentioned above. I'll let you know how it goes. The process hangs have been random, sometimes once a day, sometimes a week between.
@brianboru82 do you have any updates on your testing?
It has been working fine since the change. I'll let you know if something happens. Let me know if there's any log or something that would be helpful.
@naushir, @brianboru82,
No one's ever reported the ERROR: Device timeout detected, attempting a restart!!! message which implies that's never been the problem, so I'm not sure if the change will help.
@brianboru82,
Any output from the rpicam-still command is written to ~/allsky/tmp/capture_RPi_debug.txt, so if there's a problem that's a good file to look in. Note that the file only has the LAST command's output.
@EricClaeys do you record logs for every rpicam-still invocation in ~/allsky/tmp/capture_RPi_debug.txt? We are looking for a timeout error in the last run invocation I think.