shellcheck
                                
                                 shellcheck copied to clipboard
                                
                                    shellcheck copied to clipboard
                            
                            
                            
                        Unable to satisfy SC2021 with busybox tr
For bugs
- Rule Id (if any, e.g. SC1000): SC2021
- My shellcheck version (shellcheck --versionor "online"): online
- [x] The rule's wiki page does not already cover this (e.g. https://shellcheck.net/wiki/SC2086)
- [x] I tried on shellcheck.net and verified that this is still a problem on the latest commit
Here's a snippet or screenshot that shows the problem:
#!/bin/sh
tr '[a-z]' '[A-Z]' < "${0}"
Here's what shellcheck currently says:
tr '[A-Z]' '[a-z]' < "${0}"
    ^-- SC2021: Don't use [] around classes in tr, it replaces literal square brackets.
               ^-- SC2021: Don't use [] around classes in tr, it replaces literal square brackets.
The problem is mostly related to busybox tr, which does not support character classes. This puts us into a pickle, as '[A-Z]' '[a-z]' and 'A-Z' 'a-z' do work.
I think it's reasonably fair to assume, that in most cases where the upper/lower character classes are concerned with relation to tr, the operation is likely replace, rather then delete. Granted it maybe only works because the number of symbols match and thus [ would still be replaced with [.
Which brings us to the next scenario, and I think maybe more to the issue:
#!/bin/sh
tr 'A-Z' 'a-z' < "${0}"
which comes up with
tr 'A-Z' 'a-z' < "${0}"
     ^-- SC2019: Use '[:upper:]' to support accents and foreign alphabets.
            ^-- SC2018: Use '[:lower:]' to support accents and foreign alphabets.
However, this is what busybox's tr does not do; [:upper:] and [:lower:] are not supported; in fact, silently ignored.
I'm not sure when this was added to shellcheck; as I only recently started to get both the errors.
I understand that the actual character classes are superior, as they support a wider language set. But as this is not widely supported, a compromise would be good.
- Character classes without brackets 'A-Z','a-z'etc should, as it is now, recommended to use proper character classes due to the wider language support.
- Character classes with brackets '[A-Z]','[a-z]'etc, when used independently (e.g. with -c or -d) should, as it is now, warn (in orange actually maybe), that it unintentionally deletes, etc brackets.
- Character classes with brackets '[A-Z]','[a-z]'etc, when used combined (e.g. when both set1 and set2 are used for a simple check, or a good check to see if the number of symbols match and if the brackets are in the same location, in effect not performing a deletion, but a replace, should be accepted as is.
This would allow ancient System-V; but more importantly, busybox's tr and maybe others, to still accept character class. The brackets would be again, like in the old days, be used to identify a character class, with the exception that this is about a find replace.
It is not a perfect solution, but until busybox's tr supports classes (which may be never) silences valid and correct working behavior.
Aww, support for [:lower:] and [:upper:] is required by POSIX, so this is really too bad.
I'm not a fan of accepting tr '[A-Z]' '[a-z]' without comment, since it has the same problems that tr 'A-Z' 'a-z' does, plus it perpetuates the useless use of brackets. Is the purpose primarily to avoid ignoring it with a directive?
In my particular case, it would mean disabling at a global level, which is sad as I then miss the other useful cases of SC2021, (unless there's an easier trick to convert content from upper/lower to lower/upper case.
When you say 'the same problems', what are the problems of 'A-Z' 'a-z'? As that indicates that translating lower/upper isn't the exception to the rule.
Bash has operators for this. ^^ for upper ,, for lower and ~~ for invert. Go check the bash man page, just search for ,, and read. Not only eliminates tr confusion but also eliminates starting yet another sub process. And don’t feel bad; I’ve been programming bash for years and I just learned these last week. Don’t know what the minimum version of bash is for these.
Maybe change the error to suggest these operators?
If you are running on BusyBox, you probably don't have Bash.
The problem with tr 'A-Z' 'a-z' is that it's canonically used to translate from uppercase to lowercase, but fails for non-us-ascii characters. The suggestion is not emitted for other ranges, like tr 'a-c' '1-3'
While slightly awkward, you don't have to disable globally when tr is the first command in the script. You can e.g. use a dummy command:
#!/bin/sh
:
# shellcheck ignore=SC2021
tr 'a-z' 'A-Z' < "${0}"
or curly braces:
#!/bin/sh
{
# shellcheck ignore=SC2021
tr 'a-z' 'A-Z' < "${0}"
}
Not only that, my shared snippet does explicitly say 'sh' which in this repository would mean (or should) POSIX compliant shell :) (granted, busybox's tr apparently is not :p)
I try to avoid bashisms like the plague, so I'm sorry for not mentioning this of course :)
@koalaman how posix compliant is UTF-8? or 'other languages'?
While I suppose I'll just add the shellcheck ignore; as I don't see another way. You do point out a very valid point, of course; in my case, I use the directory name in lower case for something, the filesystem can very happily contain all kinds of characters, which thus fails.
Unless you have other suggestions, I suppose we have to close this here, and open a ticket for busybox to support character classes in tr :) (maybe they don't due to the difficulty of UTF-8?)
For me echo TEST | busybox tr '[:upper:]' '[:lower:]' seems to work correctly. Maybe character class support was added to busybox tr?
These works perfectly for me in BusyBox:
printf '%s\n' "abc123" | tr '0-9' '0' | tr 'a-z' 'x' | tr 'A-Z' 'X'
printf '%s\n' "abc123" | tr '[:digit:]' '0' | tr '[:lower:]' 'x' | tr '[:upper:]' 'X'
Maybe most people use very old versions of BusyBox.
Maybe most people use very old versions of BusyBox.
Ether very old versions or they have disabled ENABLE_FEATURE_TR_CLASSES. Character classes have been added in 2005...
https://git.busybox.net/busybox/commit/?id=f1048143ee4360affc66c39961f7862ea914400d
So when I opened this issue in 2019, I wasn't aware of ENABLE_FEATURE_TR_CLASSES but then going back to figure out what/where was causing this, is interesting.
I ran the alpine 3.1 container without thinking of its age, but probably before 2019, and indeed it does support character classes as expected.
docker run --rm -it alpine:3.1 /bin/sh
/ # printf '%s\n' "aBc123" | tr '[:digit:]' '0' | tr '[:lower:]' 'x' | tr '[:upper:]' 'X'
xXx000
Since I was working a lot with alpine (and thus busybox) I can only conclude I must have been using some different variant back then? Or Alpine pushed a minor version of alpine to all its repo with this change?
Currently it's certainly enabled: https://git.alpinelinux.org/aports/tree/main/busybox/busyboxconfig#n343 but even in alpine 2.0 it was enabled https://git.alpinelinux.org/aports/tree/main/busybox/busyboxconfig?h=2.0-stable#n173
So without chasing down this rabbit hole any further, my only conclusion can be, that this may have been an issue with some other busybox build that doesn't enable it.
I've update the topic to make this clear, and appologize for not diving slightly deeper into the busybox code ;)
Lets close this issue as busybox does support this, if enabled. The documentation probably could be updated to reflect this of course.