shellcheck ShellCheck busy loops and allocates memory until it is killed by OOM killer

ShellCheck busy loops and allocates memory until it is killed by OOM killer

Open kdudka opened this issue 1 year ago • 12 comments

For bugs

My shellcheck version (shellcheck --version or "online"): 0.9.0
Originally reported in Red Hat Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2180035

Here's a snippet or screenshot that shows the problem:

A (gzipped) script that triggers the issue is attached: testsuite.gz

I was not able to isolate the problem to a small shell script. Perhaps the attached shell script is just too long/complex for shellcheck to process it?

Here's what shellcheck currently says:

shellcheck busy loops and allocates memory until it is killed by OOM killer.

Here's what I wanted or expected to see:

shellcheck should print static analysis results and exit successfully.

Mar 22 '23 09:03 kdudka

probably same as #2652 (guessed reason is in a comment).

This is undoubtedly due to ShellCheck's new data flow analysis engine. It takes great care to be acceptably fast even for larger scripts, but 22,000 lines is something else.

you have about 307 742 lines in that test file.

Mar 23 '23 08:03 brother

@brother Thanks for the pointer! Unfortunately, in our case the problem happens with shellcheck-0.7.2, too.

Mar 23 '23 08:03 kdudka

same problem even if i close the nvim, and i have to forcibly power off

Aug 20 '23 08:08 ogios

Same problem happened during run shellcheck for a configure script (7.6k, 20986 lines).

Jan 20 '24 17:01 nukemiko

I have encountered the same issue with huge generated scripts made by autoconf.

I had a look at the attached testsuite.gz file. It turns out to be generated by autotest (part of the autotools suite, just as autoconf). Some searching indicates that this is based on https://github.com/firewalld/firewalld/blob/main/src/tests/testsuite.at, but that is not really relevant.

I bisected the file to try and understand what caused the issue. The large file itself was not a problem; if you cut the file in half, one half completes shellcheck in 0.1 seconds, while the other spins forever consuming more and more memory. (I hit ctrl-c after ~10 seconds as an indicator of whether or not the problem was present).

I managed to get it down to a single line of difference wether shellcheck hangs or not. It is an opening (, which has no obvious matching closing ) (not one that I could easily find, anyway).

Jan 31 '24 13:01 magicus

Here is the patch that turns testsuite from a CPU/memory hogger to completing in 0.2 seconds on my computer:

--- testsuite	2024-01-31 14:15:15
+++ testsuite.working	2024-01-31 14:16:06
@@ -2326,7 +2326,6 @@
 at_fn_group_banner 1 'firewall-cmd.at:5' \
   "basic options" "                                  " 1
 at_xfail=no
-(
   printf "%s\n" "1. $at_setup_line: testing $at_desc ..."
   $at_traceon

Jan 31 '24 13:01 magicus

It is not just the matching of ( itself that is problematic. I had a hunch that a long search for the closing parenthesis could trigger this bug, so I created this script to generate a huge test file enclosed within a (...) pair. But it worked fine with shellcheck.

#!/bin/bash
SCRIPT=foo.sh
rm $SCRIPT
echo "#!/bin/bash" > $SCRIPT
echo "(" >> $SCRIPT
for i in {1..1000000} ; do
  echo "echo Line $i" >>$SCRIPT
done
echo ")" >> $SCRIPT

Jan 31 '24 13:01 magicus

As I said in #2652, this does not seem to be related to the extended analysis.

Furthermore, when closely looking at the memory consumption, I see that it increases not gradually, but in huge discreet steps. My guess is that it is a single array that is getting reallocated over and over as it grows without bound.

Feb 06 '24 08:02 magicus

I have turned my eye toward the script that is problematic for me. (This is the shell script generated by autoconf for the OpenJDK project.) I have attached the script in question here: autoconf-reproducer.zip

When I run this on my M1 mac, it seemingly never ends. After about 1 minute of running, at which point it consumed > 6 GB RAM, I aborted it.

However, when a single line is deleted from the file, the analysis finished in 4-5 seconds (which is reasonable, considering that the file is ~ 170k lines), and I could not even measure the RAM consumption.

The patch that does this magic trick is here:

--- autoconf-BAD        2024-02-06 10:57:23
+++ autoconf-GOOD       2024-02-06 10:57:32
@@ -25504,7 +25504,6 @@
     fi # end with or without slashes
 
     # Now we have a usable command as new_path, with arguments in arguments
-    if test "x$OPENJDK_BUILD_OS" = "xwindows"; then
       if test "x$fixpath_prefix" = x; then
         # Only mess around if fixpath_prefix was not given

Now I realize that this deletes an opening if statements, and that will throw off all subsequent syntax. However, my point here is that it is not reasonable to expect shellcheck to handle one file perfectly fine, but fail miserably if a single line is added.

So the issue is not really the huge file of the script per se. I understand that a 170 k line script is massive, but without the problematic construct this works perfectly fine.

My guess is that there is some special construct in here that triggers a bad behavior in shellcheck. It seems like the complexity in both time and memory is growing with O(n^x) with the number of lines in the file when this construct is encountered.

Hence the large reproducer. A smaller reproducer would not show the bad complexity convincingly. I've tried my best to get it down to a single line difference. My hope is that someone more versed in shellcheck debugging can figure out exactly what problem this additional line provokes. I am a complete noob at Haskell; sorry. Otherwise I'd tried running the "bad" file for a while and then checked which array it is that is growing without bound. That'll probably give a decent clue to what the problem is. Maybe @koalaman can have a look?

Feb 06 '24 10:02 magicus

I also tried this with the latest version, v0.9.0-99-gd80fdfa, and running as shellcheck --extended-analysis=false autoconf-BAD.

This gave a better result -- the memory consumption stayed below 3 GB, and the command actually finished after somewhat more than 2 minutes. (Just to confirm, I re-ran without --extended-analysis=false and waited 3 minutes. It was still running by then, and had consumed 20 GB memory so I had to kill it.)

However, it is still a far cry from the patched file, which finishes in 5 seconds, both with and without --extended-analysis=false.

So whatever the problem is, it is aggravated by the extended analysis, but it definitely exists even without it.

Feb 06 '24 10:02 magicus

I am having a similar issue when editing my .bashrc with nvim + mason + bash-language-server (which calls shellcheck for linting). After exploring a bit, I found it's because a reference that finally leading to the file /usr/share/nvm/nvm.sh, which is only 4000 lines of code but takes 10 seconds and 2.6G memory to analyze.

Different from the result from @magicus https://github.com/koalaman/shellcheck/issues/2721#issuecomment-1929204173, setting --extended-analysis=false helps with my issue (1.3 seconds and 130M memory). I'm not an expert of shell script, so failed pinning down the part causing the issue, even after bisecting the script.

A workaround is adding a line # shellcheck external-sources=false before the line causing problem.

OS: arch linux, with kernel Linux 6.8.9-arch1-1 x86_64
shellcheck version: 0.10.0
script causing the issue: nvm.zip (from nvm 0.39.7)

May 08 '24 16:05 LucunJi

shellcheck shellcheck copied to clipboard

ShellCheck busy loops and allocates memory until it is killed by OOM killer

For bugs

Here's a snippet or screenshot that shows the problem:

Here's what shellcheck currently says:

Here's what I wanted or expected to see:

shellcheck
shellcheck copied to clipboard