shellcheck icon indicating copy to clipboard operation
shellcheck copied to clipboard

ShellCheck Data Flow Analysis Update

Open koalaman opened this issue 2 years ago • 0 comments

tl;dr: If your bug was closed as a duplicate of this one, it was fixed by ShellCheck's new DFA engine

What's new?

It took hundreds of hours of more work than expected, but f77a545282 introduces a working Data Flow Analysis engine. This is the largest infrastructure update in ShellCheck's history!

This update will allow adding several requested features (like integer variables), implementing a whole new family of checks, and drastically improve specific existing ones as they're gradually migrated.

How will Data Flow Analysis improve checks?

As an example, the first check to take advantage is SC2086 for unquoted variables.

This check tries to avoid unnecessary noise by not triggering for variables without spaces or metacharacters. Unfortunately, up until now, ShellCheck's only way to determine the contents of a variable was to read the file line by line and find the most recent line that assigned to that variable.

This has the fundamental flaw of ignoring control flow, with the most common buggy pattern being:

if condition
then
  n="$(foo)"    # Most recently *executed* assignment
else
  n=0           # Most recent *line number* with with assignment
fi
echo $n         # Reference is incorrectly assumed not to have spaces

With data flow analysis, ShellCheck will trace all possible paths through the program (represented as a nifty Control Flow Graph), and determine that the variable's value could come from either branch, and must therefore be quoted.

It works correctly for significantly more complex examples as well:

if x
then
  modifyglobal() {
    n=0
  }
else 
  modifyglobal() {
    n=1
  }
fi

modifylocal() {
  local n
  n=$(foo)
}

if y
then
  modifyglobal # ShellCheck knows that either function will assign a number
elif z
then
  n=0
  modifylocal  # ShellCheck knows that this modifies a different n
else
  n=0
  ( n=$(foo) ) # ShellCheck knows that this modification is discarded
fi

while w
do
  echo $n
  n=1          # ShellCheck knows that this may affect the line above
done

If any of these paths stop assigning a safe value, SC2086 will trigger.

Any checks that rely on variable values or general state at a point in the script will be much more accurate, and this will allow a whole new family of checks that were not possible/feasible before.

Limitations

There are of course some limitations:

  • All conditions are treated opaquely. ShellCheck still explores the else branch of an if true statement. Similarly, [ "$x" -eq 0 ] && a; [ "$x" -eq 1 ] && b; will assume that both a and b can run.
  • Stdin/stdout is treated opaquely, so $(echo 0) is assumed to be an arbitrary string
  • trap / set -e is not yet supported, and builtins like declare can't be overridden by functions
  • This is ~2000 lines worth of new, relatively complex code. It's bound to introduce some issues, including ones that were previously fixed in older versions. Fortunately the foundation is much stronger this time around.

Performance

Initial prototypes worked well on small scripts, but had unacceptable performance on large scripts. A lot of work was put into avoiding exponential behavior. The performance overhead should be negligible on <1000 line scripts (150ms vs 100ms), and 10k+ line scripts should still finish in reasonable time. CPU and memory usage is still an active area of improvement.

Other

In addition to the CFG&DFA modules, this change introduces ShellCheck.Debug, a module full of convenience functions like shellcheckString :: String -> CheckResult and everything inbetween like stringToAst :: String -> Token and astToCfg :: Token -> CFGraph, plus features for outputting ASTs, CFGs, and DFAs to GraphViz format.

koalaman avatar Jul 20 '22 17:07 koalaman