False positive of SC2059 where the "format string" is "${array[@]}"
For bugs
- Rule Id (if any, e.g. SC1000): SC2059
- My shellcheck version (
shellcheck --versionor "online"): commit 3785a08 and online - [x] I tried on shellcheck.net and verified that this is still a problem on the latest commit
Here's a snippet or screenshot that shows the problem:
#!/usr/bin/env bash
printf_parameters=("$@")
printf -- "${printf_parameters[@]}"
Here's what shellcheck currently says:
Line 3:
printf -- "${printf_parameters[@]}"
^-- SC2059: Don't use variables in the printf format string. Use printf "..%s.." "$foo".
Here's what I wanted or expected to see:
No errors as "$array[@]" isn't even expanded yet and is not a regular variable
On Thu, 12 Oct 2017, 林博仁(Buo-Ren Lin) wrote:
printf -- "${printf_parameters[@]}" ^-- SC2059: Don't use variables in the printf format string. Use printf "..%s.." "$foo".
Here's what I wanted or expected to see:
No errors as
"$array[@]"isn't even expanded yet
(What do you mean by "isn't expanded"? It sure looks expanded to me.)
This is working as intended, because in the majority of cases where people write
printf "$foo"
what they need is
printf %s "$foo"
If your particular use-case is that you pass in the format string expecting it to be used by printf, then you should disable SC2059 for that line.
It's not "wrong" or "broken" to disable shellcheck warnings. On the contrary, the shellcheck disable pragma acts as documentation that some slightly odd piece of code is not a mistake, that on the contrary someone thought about it and wanted it that way.
It would arguably aid future readability to write something more like this:
fmt="${1:?Missing or empty format}"
args=("${@:2}")
# shellcheck disable=SC2059
printf "$fmt" "${args[@]}"
Apologies for the delayal.
I do can disable myself, however I believe this is a special case where SC2059 shouldn't match on because that isn't exactly a format control string, but an array expansion.
Same for printf "$@"
Arguably this particular case could have a separate "ensure the first element is a format string" info message, but it definitely needs a warning since this could very much look like it's supposed to print an array
Granted, we can use disable, but perhaps we could make an exception when the format string is obviously a format string, eg. when the variable is named $fmt, $format or $format_string ?
local format=' %s %-32s # %s\n'
local blank="${pname//?/ }"
printf '%s usage:\n\n' "${pname}"
printf "$format" "${pname}" '-h|--help|help' 'print this help.'
# ... and 20 other such lines...
I think trying to do NLP on the variable name will be more confusing than anything. Like, why does it work with "fmt" but not "frmt" or "f" or "printf_arguments"?
I think trying to do NLP on the variable name will be more confusing than anything. Like, why does it work with "fmt" but not "frmt" or "f" or "printf_arguments"?
Agreed. If the contents of the variable is known at analysis-time, and contains format specifiers, that should be enough to decide not to issue the warning. Here, data flow analysis can show that the variable contains valid format specifiers, so it could disable the warning.
I think trying to do NLP on the variable name will be more confusing than anything. Like, why does it work with "fmt" but not "frmt" or "f" or "printf_arguments"?
Agreed. If the contents of the variable is known at analysis-time, and contains format specifiers, that should be enough to decide not to issue the warning. Here, data flow analysis can show that the variable contains valid format specifiers, so it could disable the warning.
https://www.shellcheck.net/wiki/SC2059 has the goal of; if it is possible that non-constant "%"s can exist in printf's [format] argument, warn to separate the array/variable from printf's [format] (since tainted variables with control sequences are dangerous to use for [format]).
What will the data flow analysis do? If the variable has sources such: as user input, filesystem reads, or sockets, give the warning; or else disable? So: allow that [format] includes variables, but require that those variables are constant? Such as https://codeql.github.com/codeql-query-help/javascript/js-tainted-format-string/ has?
Here, data flow analysis can show that the variable contains valid format specifiers, so it could disable the warning.
In the shell, data flow analysis is much less reliable than you think it is. Consider:
fmt='name: %s\tvalue=%s\n'
log "$value"
printf "$fmt" "$name" "$value"
You might think it's obvious that fmt contains a static value. You would be wrong: log could be a function that modifies fmt, perhaps tainting it with data from an untrustworthy source.
You cannot perform static analysis to find the definitions of functions, because they are not static declarations but rather dynamic commands that are invoked at run-time.
fn=$( grep log /etc/myconfig ) # this matches a line with just the word "log"
if ((RANDOM % 2))
then
eval "$fn()"' { read -p 'Format? ' fmt ; echo "Using value=$* with fmt=$fmt" >> /var/log/my.log ; }'
else
eval "$fn()"' { echo "Using value=$*" >> /var/log/my.log ; }'
fi
fmt='name: %s\tvalue=%s\n'
log "$value"
printf "$fmt" "$name" "$value"
It is critically important that Shellcheck errs on the side of producing false warnings rather than failing to give real warnings.
Reducing false warnings and false non-warnings both to zero equates to the halting problem: it is mathematically provable that it can't be done in finite time. This means that Shellcheck cannot be "perfect" and there will always be exceptions where it cannot do the "intuitive" thing.
Rather than making ShellCheck arbitrarily complicated (and slower, and less understandable), we are all better off if everyone makes peace with the idea that we should expect to disable a small proportion of checks.
Here, data flow analysis can show that the variable contains valid format specifiers, so it could disable the warning.
In the shell, data flow analysis is much less reliable than you think it is. Consider:
fmt='name: %s\tvalue=%s\n' log "$value" printf "$fmt" "$name" "$value"You might think it's obvious that
fmtcontains a static value. You would be wrong:logcould be a function that modifiesfmt, perhaps tainting it with data from an untrustworthy source.You cannot perform static analysis to find the definitions of functions, because they are not static declarations but rather dynamic commands that are invoked at run-time.
Wish that static analysis of code flow allows to produce a list of functions which alter their inputs, plus a list of functions which do not;
allows that your example log command is flagged if calls will taint $value, but passes if not.
If this requires shellcheck to include a whole interpreter of /bin/sh, perhaps you can just include an open source package which does this for you (lots of FLOSS packages process source code into "intermediate" computer-usable formats).
If such will cause too much code bloat, just remove the sections which shellcheck does not use.
This has uses for SC2059, plus uses for numerous other flags/notices.
Here, data flow analysis can show that the variable contains valid format specifiers, so it could disable the warning.
In the shell, data flow analysis is much less reliable than you think it is. Consider:
fmt='name: %s\tvalue=%s\n' log "$value" printf "$fmt" "$name" "$value"
More simple solution: Is not true analysis of code flow if you just assume that all functions alter their inputs unless the input variable is RO, but this allows you to exclude lots of false positives (without false negatives), so that the value of those notices improves.
This still misses an important point: in general it is not possible to determine through static analysis what is or isn't a function call; it can vary between runs of the same script.
And a major reason for using static analysis is to allow us to avoid running a script, since measures intended to prevent a script from damaging the host system may alter its behaviour in ways that hide errors.
The example given was merely to illustrate the fact that static analysis cannot detect all possible behaviours; any attempt to "solve" a specific example will simply result in new ways to trigger false positive warnings. Please read up on "The Halting Problem".
And perhaps this: https://cwe.mitre.org/data/definitions/134.html
On Fri, Jan 31, 2025 at 3:52 PM Martin Kealey @.***> wrote:
This still misses an important point: in general it is not possible to determine through static analysis what is or isn't a function call; it can vary between runs of the same script.
And a major reason for using static analysis is to allow us to avoid running a script, since measures intended to prevent a script from damaging the host system may alter its behaviour in ways that hide errors.
The example given was merely to illustrate the fact that static analysis cannot detect all possible behaviours; any attempt to "solve" a specific example will simply result in new ways to trigger false positive warnings. Please read up on "The Halting Problem".
— Reply to this email directly, view it on GitHub https://github.com/koalaman/shellcheck/issues/1028#issuecomment-2628583321, or unsubscribe https://github.com/notifications/unsubscribe-auth/AUF2F2YLVC4BGHBOCOXL7PT2NQEEJAVCNFSM6AAAAABV24MYEGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMRYGU4DGMZSGE . You are receiving this because you are subscribed to this thread.Message ID: @.***>
This still misses an important point: in general it is not possible to determine through static analysis
Oops,used the wrong word. So, not "static analysis", but whatever you call what https://codeql.github.com/codeql-query-help/javascript/js-tainted-format-string/ does.
@brlin-tw small point of order: please fix the title of this issue to have braces around array[@].
@koalaman
@brlin-tw small point of order: please fix the title of this issue to have braces around
array[@]
Done.