shellcheck icon indicating copy to clipboard operation
shellcheck copied to clipboard

False positive of SC2059 where the "format string" is "${array[@]}"

Open brlin-tw opened this issue 8 years ago • 16 comments

For bugs

  • Rule Id (if any, e.g. SC1000): SC2059
  • My shellcheck version (shellcheck --version or "online"): commit 3785a08 and online
  • [x] I tried on shellcheck.net and verified that this is still a problem on the latest commit

Here's a snippet or screenshot that shows the problem:

#!/usr/bin/env bash
printf_parameters=("$@")
printf -- "${printf_parameters[@]}"

Here's what shellcheck currently says:

Line 3:
printf -- "${printf_parameters[@]}"
          ^-- SC2059: Don't use variables in the printf format string. Use printf "..%s.." "$foo".

Here's what I wanted or expected to see:

No errors as "$array[@]" isn't even expanded yet and is not a regular variable

brlin-tw avatar Oct 12 '17 18:10 brlin-tw

On Thu, 12 Oct 2017, 林博仁(Buo-Ren Lin) wrote:

printf -- "${printf_parameters[@]}" ^-- SC2059: Don't use variables in the printf format string. Use printf "..%s.." "$foo".

Here's what I wanted or expected to see:

No errors as "$array[@]" isn't even expanded yet

(What do you mean by "isn't expanded"? It sure looks expanded to me.)

This is working as intended, because in the majority of cases where people write

printf "$foo"

what they need is

printf %s "$foo"

If your particular use-case is that you pass in the format string expecting it to be used by printf, then you should disable SC2059 for that line.

It's not "wrong" or "broken" to disable shellcheck warnings. On the contrary, the shellcheck disable pragma acts as documentation that some slightly odd piece of code is not a mistake, that on the contrary someone thought about it and wanted it that way.

It would arguably aid future readability to write something more like this:

fmt="${1:?Missing or empty format}"
args=("${@:2}")

# shellcheck disable=SC2059
printf "$fmt" "${args[@]}"

kurahaupo avatar Oct 17 '17 11:10 kurahaupo

Apologies for the delayal.

I do can disable myself, however I believe this is a special case where SC2059 shouldn't match on because that isn't exactly a format control string, but an array expansion.

brlin-tw avatar Oct 28 '17 01:10 brlin-tw

Same for printf "$@"

informatimago avatar Dec 10 '18 18:12 informatimago

Arguably this particular case could have a separate "ensure the first element is a format string" info message, but it definitely needs a warning since this could very much look like it's supposed to print an array

koalaman avatar Dec 10 '18 18:12 koalaman

Granted, we can use disable, but perhaps we could make an exception when the format string is obviously a format string, eg. when the variable is named $fmt, $format or $format_string ?

    local format='    %s %-32s # %s\n'
    local blank="${pname//?/ }"
    printf '%s usage:\n\n' "${pname}"
    printf "$format" "${pname}" '-h|--help|help'                 'print this help.'
    # ... and 20 other such lines...

informatimago avatar Sep 01 '21 18:09 informatimago

I think trying to do NLP on the variable name will be more confusing than anything. Like, why does it work with "fmt" but not "frmt" or "f" or "printf_arguments"?

koalaman avatar Sep 18 '21 19:09 koalaman

I think trying to do NLP on the variable name will be more confusing than anything. Like, why does it work with "fmt" but not "frmt" or "f" or "printf_arguments"?

Agreed. If the contents of the variable is known at analysis-time, and contains format specifiers, that should be enough to decide not to issue the warning. Here, data flow analysis can show that the variable contains valid format specifiers, so it could disable the warning.

informatimago avatar Sep 18 '21 21:09 informatimago

I think trying to do NLP on the variable name will be more confusing than anything. Like, why does it work with "fmt" but not "frmt" or "f" or "printf_arguments"?

Agreed. If the contents of the variable is known at analysis-time, and contains format specifiers, that should be enough to decide not to issue the warning. Here, data flow analysis can show that the variable contains valid format specifiers, so it could disable the warning.

https://www.shellcheck.net/wiki/SC2059 has the goal of; if it is possible that non-constant "%"s can exist in printf's [format] argument, warn to separate the array/variable from printf's [format] (since tainted variables with control sequences are dangerous to use for [format]).

What will the data flow analysis do? If the variable has sources such: as user input, filesystem reads, or sockets, give the warning; or else disable? So: allow that [format] includes variables, but require that those variables are constant? Such as https://codeql.github.com/codeql-query-help/javascript/js-tainted-format-string/ has?

SwuduSusuwu avatar Jan 25 '25 03:01 SwuduSusuwu

Here, data flow analysis can show that the variable contains valid format specifiers, so it could disable the warning.

In the shell, data flow analysis is much less reliable than you think it is. Consider:

fmt='name: %s\tvalue=%s\n'
log "$value"
printf "$fmt" "$name" "$value"

You might think it's obvious that fmt contains a static value. You would be wrong: log could be a function that modifies fmt, perhaps tainting it with data from an untrustworthy source.

You cannot perform static analysis to find the definitions of functions, because they are not static declarations but rather dynamic commands that are invoked at run-time.

fn=$( grep log /etc/myconfig ) # this matches a line with just the word "log"
if ((RANDOM % 2))
then
  eval "$fn()"' { read -p 'Format? ' fmt ; echo "Using value=$* with fmt=$fmt" >> /var/log/my.log ; }'
else
  eval "$fn()"' { echo "Using value=$*" >> /var/log/my.log ; }'
fi

fmt='name: %s\tvalue=%s\n'
log "$value"
printf "$fmt" "$name" "$value"

It is critically important that Shellcheck errs on the side of producing false warnings rather than failing to give real warnings.

Reducing false warnings and false non-warnings both to zero equates to the halting problem: it is mathematically provable that it can't be done in finite time. This means that Shellcheck cannot be "perfect" and there will always be exceptions where it cannot do the "intuitive" thing.

Rather than making ShellCheck arbitrarily complicated (and slower, and less understandable), we are all better off if everyone makes peace with the idea that we should expect to disable a small proportion of checks.

kurahaupo avatar Jan 27 '25 05:01 kurahaupo

Here, data flow analysis can show that the variable contains valid format specifiers, so it could disable the warning.

In the shell, data flow analysis is much less reliable than you think it is. Consider:

fmt='name: %s\tvalue=%s\n'
log "$value"
printf "$fmt" "$name" "$value"

You might think it's obvious that fmt contains a static value. You would be wrong: log could be a function that modifies fmt, perhaps tainting it with data from an untrustworthy source.

You cannot perform static analysis to find the definitions of functions, because they are not static declarations but rather dynamic commands that are invoked at run-time.

Wish that static analysis of code flow allows to produce a list of functions which alter their inputs, plus a list of functions which do not; allows that your example log command is flagged if calls will taint $value, but passes if not.

If this requires shellcheck to include a whole interpreter of /bin/sh, perhaps you can just include an open source package which does this for you (lots of FLOSS packages process source code into "intermediate" computer-usable formats). If such will cause too much code bloat, just remove the sections which shellcheck does not use. This has uses for SC2059, plus uses for numerous other flags/notices.

SwuduSusuwu avatar Jan 30 '25 16:01 SwuduSusuwu

Here, data flow analysis can show that the variable contains valid format specifiers, so it could disable the warning.

In the shell, data flow analysis is much less reliable than you think it is. Consider:

fmt='name: %s\tvalue=%s\n'
log "$value"
printf "$fmt" "$name" "$value"

More simple solution: Is not true analysis of code flow if you just assume that all functions alter their inputs unless the input variable is RO, but this allows you to exclude lots of false positives (without false negatives), so that the value of those notices improves.

SwuduSusuwu avatar Jan 30 '25 19:01 SwuduSusuwu

This still misses an important point: in general it is not possible to determine through static analysis what is or isn't a function call; it can vary between runs of the same script.

And a major reason for using static analysis is to allow us to avoid running a script, since measures intended to prevent a script from damaging the host system may alter its behaviour in ways that hide errors.

The example given was merely to illustrate the fact that static analysis cannot detect all possible behaviours; any attempt to "solve" a specific example will simply result in new ways to trigger false positive warnings. Please read up on "The Halting Problem".

kurahaupo avatar Jan 31 '25 23:01 kurahaupo

And perhaps this: https://cwe.mitre.org/data/definitions/134.html

On Fri, Jan 31, 2025 at 3:52 PM Martin Kealey @.***> wrote:

This still misses an important point: in general it is not possible to determine through static analysis what is or isn't a function call; it can vary between runs of the same script.

And a major reason for using static analysis is to allow us to avoid running a script, since measures intended to prevent a script from damaging the host system may alter its behaviour in ways that hide errors.

The example given was merely to illustrate the fact that static analysis cannot detect all possible behaviours; any attempt to "solve" a specific example will simply result in new ways to trigger false positive warnings. Please read up on "The Halting Problem".

— Reply to this email directly, view it on GitHub https://github.com/koalaman/shellcheck/issues/1028#issuecomment-2628583321, or unsubscribe https://github.com/notifications/unsubscribe-auth/AUF2F2YLVC4BGHBOCOXL7PT2NQEEJAVCNFSM6AAAAABV24MYEGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMRYGU4DGMZSGE . You are receiving this because you are subscribed to this thread.Message ID: @.***>

wileyhy avatar Feb 05 '25 04:02 wileyhy

This still misses an important point: in general it is not possible to determine through static analysis

Oops,used the wrong word. So, not "static analysis", but whatever you call what https://codeql.github.com/codeql-query-help/javascript/js-tainted-format-string/ does.

SwuduSusuwu avatar Feb 05 '25 04:02 SwuduSusuwu

@brlin-tw small point of order: please fix the title of this issue to have braces around array[@].

kurahaupo avatar Mar 12 '25 05:03 kurahaupo

@koalaman

@brlin-tw small point of order: please fix the title of this issue to have braces around array[@]

Done.

brlin-tw avatar Mar 12 '25 05:03 brlin-tw