dockcheck icon indicating copy to clipboard operation
dockcheck copied to clipboard

Silent crash on latest commit

Open mrnetlex opened this issue 9 months ago • 19 comments

Dockcheck crashes on latest commit (8dd1bba75b9f07276a399679498cc7541ea7505e). I only get something like this:

Image

If I checkout commit before (b5c03a2caa0aa7f3457a38d1b96fc28de1ae5c3a), everything works fine.

System info:

  • Ubuntu 22.04 x86
  • regctl installed as standalone binary
  • Docker version 28.0.4, build b8034c0

mrnetlex avatar Mar 30 '25 15:03 mrnetlex

Thank you for reporting!

Odd. I don't get that on any of my test machines - none Ubuntu but debian, arch and fedora. At a quick glance - ~the only thing~ being related to xargs is this line:

Older commit: xargs ${XargsAsync} -I {} bash -c 'check_image "{}"' \
Latest: xargs $XargsAsync -I {} bash -c 'check_image "{}"' \

What if you try adding those curly bracers around $XargsAsync? Line 407 in the latest. Any difference if you try to run with a different -x value? like ./dockcheck.sh -x 10 or disable with 0

Edit: My bad - it's ofc due to the bash option pipefail - though I'm not sure why. I'll keep digging.

mag37 avatar Mar 30 '25 16:03 mag37

Any clues @Thaurin ? When reading about xargs+signal13 it seems like it's caused by the pipe after xargs closing before xargs is done writing.

mag37 avatar Mar 30 '25 16:03 mag37

Current workaround is by redirecting the error from xargs by adding 2>/dev/null to line 407 like this:

  xargs $XargsAsync -I {} bash -c 'check_image "{}"' 2>/dev/null \

This will only "allow" xargs to continue even though this harmless error occurs (though also if any other error would occur in the same xargs). If @mrnetlex (or anyone else with the same issue) can test this - I'll make this change to main as it seems like a good alternative.

Alternatively by ~commenting out~ removing pipefail at line 7, though that would affect the whole script:
~set -euo #pipefail~ Edit: Commenting out was a mistake - remove it completely for now if so:

set -euo

mag37 avatar Mar 30 '25 16:03 mag37

Thanks for fast response. Unfortunatelly, if I modify the script according to your instructions - like this:

diff --git a/dockcheck.sh b/dockcheck.sh
index 28ba27d..fc3a27c 100755
--- a/dockcheck.sh
+++ b/dockcheck.sh
@@ -404,7 +404,7 @@ while read -r line; do
   esac
 done < <( \
   docker ps $Stopped --filter "name=$SearchName" --format '{{.Names}}' | \
-  xargs $XargsAsync -I {} bash -c 'check_image "{}"' \
+  xargs $XargsAsync -I {} bash -c 'check_image "{}"' 2>/dev/null \
 )

I get no output from dockcheck.

Commenting out pipefail and reverting chagnes on line 407, gets me somthing like this

netlex@q957:~/dockcheck$ ./dockcheck.sh
allexport       off
braceexpand     on
emacs           off
errexit         on
errtrace        off
functrace       off
hashall         on
histexpand      off
history         off
ignoreeof       off
interactive-comments    on
keyword         off
monitor         off
noclobber       off
noexec          off
noglob          off
nolog           off
notify          off
nounset         on
onecmd          off
physical        off
pipefail        off
posix           off
privileged      off
verbose         off
vi              off
xtrace          off
netlex@q957:~/dockcheck$ xargs: bash: terminated by signal 13

mrnetlex avatar Mar 30 '25 17:03 mrnetlex

Ah yes - commenting out was a mistake - should be removed:

set -euo

Though I don't know why the redirection in xargs would cause all output to be hidden - odd. I've just tested both options (though I dont have the issue to begin with) and the script runs fine on my end.

mag37 avatar Mar 30 '25 17:03 mag37

On my second system (Ubuntu 24.04 on ARM) everything works fine. Only impactful diffrence that I could think about is that it uses xargs 4.9.0 instead of 4.8.0

mrnetlex avatar Mar 30 '25 18:03 mrnetlex

I am also having this issue, none of the previously mentioned workarounds make a difference, i still get no output and the error message "xargs: bash: terminated by signal 13". Happy to try any other ideas

ckambler avatar Mar 30 '25 18:03 ckambler

On my second system (Ubuntu 24.04 on ARM) everything works fine. Only impactful diffrence that I could think about is that it uses xargs 4.9.0 instead of 4.8.0

That's odd! And xargs were there in the previous commit too - I cant understand what would've changed this.

I am also having this issue, none of the previously mentioned workarounds make a difference, i still get no output and the error message "xargs: bash: terminated by signal 13". Happy to try any other ideas

Thank you for testing - so frustrating to troubleshoot when "it works on my machine" 😆

I just spun up an Ubuntu LXC to test this further

root@ubutest:~/dockcheck# lsb_release -a
Distributor ID: Ubuntu
Description:    Ubuntu 22.04 LTS
Release:        22.04
Codename:       jammy

root@ubutest:~/dockcheck# xargs --version
xargs (GNU findutils) 4.8.0

And here it works - both as is, and with the redirection and/or pipefail removed.

root@ubutest:~/dockcheck# ./dockcheck.sh -x 10
[##################################################] 2/2 

Containers on latest version:
dozzle
homer

No updates available, exiting.

mag37 avatar Mar 30 '25 18:03 mag37

This is the logic used with xargs if anyone wanna troubleshoot with just a oneliner. Will only print each name of container with an arrow prefix like -> homer

while read -r line; do echo "-> $line"; done < <(docker ps --format '{{.Names}}' | xargs -P 10 -I {} bash -c 'echo "{}"')

mag37 avatar Mar 30 '25 18:03 mag37

Realized another change that might loosely be related - I changed all the echo to printf in the check_image function.

Here's the old bit - can be pasted right in place if anyone care to test it - starting at line 346

check_image() {
  i="$1"
  local Excludes=($Excludes_string)
  for e in "${Excludes[@]}" ; do
    if [[ "$i" == "$e" ]]; then
      echo Skip $i
      return
    fi
  done

  local NoUpdates GotUpdates GotErrors
  ImageId=$(docker inspect "$i" --format='{{.Image}}')
  RepoUrl=$(docker inspect "$i" --format='{{.Config.Image}}')
  LocalHash=$(docker image inspect "$ImageId" --format '{{.RepoDigests}}')

  # Checking for errors while setting the variable
  if RegHash=$(${t_out} $regbin -v error image digest --list "$RepoUrl" 2>&1) ; then
    if [[ "$LocalHash" = *"$RegHash"* ]] ; then
      echo NoUpdates "$i"
    else
      if [[ -n "$DaysOld" ]] && ! datecheck ; then
        echo NoUpdates "+$i ${ImageAge}d"
      else
        echo GotUpdates "$i"
      fi
    fi
  else
    # Here the RegHash is the result of an error code
    echo GotErrors "$i - ${RegHash}"
  fi
}

mag37 avatar Mar 30 '25 19:03 mag37

This is the logic used with xargs if anyone wanna troubleshoot with just a oneliner. Will only print each name of container with an arrow prefix like -> homer

while read -r line; do echo "-> $line"; done < <(docker ps --format '{{.Names}}' | xargs -P 10 -I {} bash -c 'echo "{}"')

This works just fine for me, returns list of all containers.

Old check image function doesn't change anything, still getting xargs: bash: terminated by signal 13

mrnetlex avatar Mar 30 '25 19:03 mrnetlex

This is the logic used with xargs if anyone wanna troubleshoot with just a oneliner. Will only print each name of container with an arrow prefix like -> homer while read -r line; do echo "-> $line"; done < <(docker ps --format '{{.Names}}' | xargs -P 10 -I {} bash -c 'echo "{}"')

This works just fine for me, returns list of all containers.

Old check image function doesn't change anything, still getting xargs: bash: terminated by signal 13

So it's not xargs itself and not the function it calls for.

If you'd remove the -set -euo on line 7? Well specifically the e ? Actually try removing the whole line for now.

Edit: Another thing to test is to add a check for pipefail and rewrite it,

  xargs $XargsAsync -I {} bash -c 'check_image "{}" 2>/dev/null' || \
  { if [ "$(kill -l "$?")" = PIPE ]; then exit 0; else exit "$?"; fi; } \
)

So add the || and then the if-line afterwards. (I'm grasping here - I'm sorry - wish I could reproduce it and understand it, fix it and not roll back)

mag37 avatar Mar 30 '25 19:03 mag37

Removing one the -e flag seems to fix the issue. (Same with whole line.)

Replacing

xargs $XargsAsync -I {} bash -c 'check_image "{}"' \

with

  xargs $XargsAsync -I {} bash -c 'check_image "{}" 2>/dev/null' || \
  { if [ "$(kill -l "$?")" = PIPE ]; then exit 0; else exit "$?"; fi; } \

outputs: xargs: bash: terminated by signal 13 ./dockcheck.sh: line 409: kill: 125: invalid signal specification


(I'm grateful that you're willing to spend your time on a bug that you can't even replicate.)

mrnetlex avatar Mar 30 '25 20:03 mrnetlex

Removing one the -e flag seems to fix the issue. (Same with whole line.)

Wonderful - thank you! That's the quickfix I needed and I'll try to reproduce this myself so I can keep working on the "why".

I'll remove the -e for now then, did it work without e but with pipefail ? So -set -uo pipefail.

mag37 avatar Mar 30 '25 20:03 mag37

Yes, with pipefail, but without e. Nothing else is modified. I'm still trying to guess what can be specific about my setup that it fails. Does anybody with the same issue use homebrew on their system?

mrnetlex avatar Mar 30 '25 20:03 mrnetlex

Superb thanks a lot for troubleshooting and reporting back. I'll push a hotfix in a minute.

Will try to set up more testing environments to try to reproduce it. Someone else just reported a unbound variable bug #150 that I didn't hit in any of my tests either - my testing is flawed.

mag37 avatar Mar 30 '25 20:03 mag37

removing the e worked for me! Just pulled the 0.6.1 update and seems to work now. Thank you!

ckambler avatar Mar 30 '25 21:03 ckambler

Any clues @Thaurin ? When reading about xargs+signal13 it seems like it's caused by the pipe after xargs closing before xargs is done writing.

Seems like a broken pipe, yes; xargs keeps writing to it, but the pipe ain't listening anymore.

Current workaround is by redirecting the error from xargs by adding 2>/dev/null to line 407 like this:

xargs $XargsAsync -I {} bash -c 'check_image "{}"' 2>/dev/null \

Removing one the -e flag seems to fix the issue. (Same with whole line.)

So it sound like one of the xargs instances returned an error (non-zero) code, which makes the script exit immediately because of set -e, closing the pipe while xargs or its other instances are still trying to write to it, resulting in a broken pipe error? What error does xargs throw? The broken pipe error comes from bash, right?

(I'm grateful that you're willing to spend your time on a bug that you can't even replicate.)

Me too, because I introduced these bugs, lol.

Thaurin avatar Mar 31 '25 12:03 Thaurin

Seems like a broken pipe, yes; xargs keeps writing to it, but the pipe ain't listening anymore.

So it sound like one of the xargs instances returned an error (non-zero) code, which makes the script exit immediately because of set -e, closing the pipe while xargs or its other instances are still trying to write to it, resulting in a broken pipe error? What error does xargs throw? The broken pipe error comes from bash, right?

Yeah the broken pipe error is from bash indeed. I guess to get "debug log" you've got to set set -x both just outside the xargs (eg line 393, before the while read) and within the function called (line 351, just inside check_image()) . I've tried to understand the underlying issue - what's writing/reading from the broken pipe.

I just did another test with a locally built container that would fail the check and throw an error - but that works as intended:

Containers with errors, won't get updated:
local-flask - failed to request manifest head docker.io/library/local-local-flask:latest: unauthorized
info: 'unauthorized' often means not found in a public registry.

and prints the predefined message while continuing the rest of the script without issues.


(I'm grateful that you're willing to spend your time on a bug that you can't even replicate.)

Me too, because I introduced these bugs, lol.

Well the xargs was a great addition imo - many happy reports from it. Just brought some hard-to-troubleshoot issues with it 😁 especially when I try to "harden" bash little with safer options.

mag37 avatar Mar 31 '25 12:03 mag37

Closing this as it's stale and wont be worked on - as I've also read and been suggested to not use the set -e option.

mag37 avatar Aug 05 '25 21:08 mag37