toybox icon indicating copy to clipboard operation
toybox copied to clipboard

`posix/ls.c` compare should use `strcasecmp` rather than `strcmp`

Open omertuc opened this issue 3 years ago • 12 comments

Regular strcmp keeps all upper-case files on top, which is confusing for humans. strcasecmp is more natural and seems to be the behavior in other popular implementations of ls.

See this example, notably the location of Config.in vs the location of configure:

Before this fix:

./ls -lah
total 492K
drwx------  10 omer omer  460 2022-07-14 11:15 .
drwxrwxrwt 134 root root 6.1K 2022-07-14 11:15 ..
-rw-rw-r--   1 omer omer 6.9K 2022-07-14 11:15 .config
-rw-rw-r--   1 omer omer   62 2022-07-14 11:15 .git
drwxrwxr-x   3 omer omer   60 2022-07-14 11:15 .github
-rw-rw-r--   1 omer omer  114 2022-07-14 11:15 .gitignore
-rw-rw-r--   1 omer omer  41K 2022-07-14 11:15 .singlemake
-rw-rw-r--   1 omer omer 5.8K 2022-07-14 11:15 Config.in
-rw-rw-r--   1 omer omer  666 2022-07-14 11:15 LICENSE
-rw-rw-r--   1 omer omer 2.1K 2022-07-14 11:15 Makefile
-rw-rw-r--   1 omer omer 6.1K 2022-07-14 11:15 README
-rwxrwxr-x   1 omer omer  926 2022-07-14 11:15 configure
drwxrwxr-x   4 omer omer  300 2022-07-14 11:15 generated
drwxrwxr-x   3 omer omer  520 2022-07-14 11:15 kconfig
drwxrwxr-x   2 omer omer  400 2022-07-14 11:15 lib
lrwxrwxrwx   1 omer omer    6 2022-07-14 11:15 ls -> toybox
-rw-rw-r--   1 omer omer 8.6K 2022-07-14 11:15 main.c
drwxrwxr-x   3 omer omer  460 2022-07-14 11:15 scripts
drwxrwxr-x   3 omer omer 2.3K 2022-07-14 11:15 tests
-r-xr-xr-x   1 omer omer 388K 2022-07-14 11:15 toybox
drwxrwxr-x   9 omer omer  180 2022-07-14 11:15 toys
-rw-rw-r--   1 omer omer 3.6K 2022-07-14 11:15 toys.h
drwxrwxr-x   5 omer omer  480 2022-07-14 11:15 www

After this fix:

./ls -lah
total 560K
drwxrwxr-x  11 omer omer 4.0K 2022-07-14 11:13 .
drwxr-xr-x 403 omer omer  20K 2022-07-14 11:08 ..
-rw-rw-r--   1 omer omer 6.9K 2022-07-14 11:13 .config
-rw-rw-r--   1 omer omer 6.9K 2022-07-14 11:08 .config.old
drwxrwxr-x   8 omer omer 4.0K 2022-07-14 11:08 .git
drwxrwxr-x   3 omer omer 4.0K 2022-07-14 11:08 .github
-rw-rw-r--   1 omer omer  114 2022-07-14 11:08 .gitignore
-rw-rw-r--   1 omer omer  41K 2022-07-14 11:13 .singlemake
-rw-rw-r--   1 omer omer 5.8K 2022-07-14 11:08 Config.in
-rwxrwxr-x   1 omer omer  926 2022-07-14 11:08 configure
drwxrwxr-x   4 omer omer 4.0K 2022-07-14 11:08 generated
drwxrwxr-x   3 omer omer 4.0K 2022-07-14 11:08 kconfig
drwxrwxr-x   2 omer omer 4.0K 2022-07-14 11:08 lib
-rw-rw-r--   1 omer omer  666 2022-07-14 11:08 LICENSE
lrwxrwxrwx   1 omer omer    6 2022-07-14 11:08 ls -> toybox
-rw-rw-r--   1 omer omer 8.6K 2022-07-14 11:08 main.c
-rw-rw-r--   1 omer omer 2.1K 2022-07-14 11:08 Makefile
-rw-rw-r--   1 omer omer 6.1K 2022-07-14 11:08 README
drwxrwxr-x   3 omer omer 4.0K 2022-07-14 11:08 scripts
drwxrwxr-x   3 omer omer 4.0K 2022-07-14 11:08 tests
-r-xr-xr-x   1 omer omer 388K 2022-07-14 11:13 toybox
drwxrwxr-x   9 omer omer 4.0K 2022-07-14 11:08 toys
-rw-rw-r--   1 omer omer 3.6K 2022-07-14 11:08 toys.h
drwxrwxr-x   5 omer omer 4.0K 2022-07-14 11:08 www

omertuc avatar Jul 14 '22 09:07 omertuc

That behavior is locale dependent, and not what happens for the posix/c locale:

$ LC_ALL=c ls -l total 76 -rw-r--r-- 1 landley landley 5969 May 12 11:32 Config.in -rw-r--r-- 1 landley landley 666 May 12 11:32 LICENSE -rw-r--r-- 1 landley landley 2185 Jul 10 04:55 Makefile -rw-r--r-- 1 landley landley 6306 May 12 11:32 README -rwxr-xr-x 1 landley landley 926 Jul 1 06:31 configure drwxr-xr-x 3 landley landley 4096 Jul 14 05:37 kconfig drwxr-xr-x 2 landley landley 4096 Jul 12 22:07 lib -rw-r--r-- 1 landley landley 8839 May 12 11:32 main.c drwxr-xr-x 3 landley landley 4096 Jul 12 04:51 scripts drwxr-xr-x 3 landley landley 4096 Jul 11 04:32 tests drwxr-xr-x 9 landley landley 4096 May 12 11:32 toys -rw-r--r-- 1 landley landley 3727 May 12 11:32 toys.h drwxr-xr-x 5 landley landley 4096 May 23 10:48 www

landley avatar Jul 14 '22 10:07 landley

I'm inclined to close this since there's been no reply to the observation that toybox only implements the C locale, but then again ls --sort=WORD exists so ls --sort=case wouldn't be unreasonable? (But that also wouldn't be supported by the gnu/dammit version, which only seems to be able to specify case sensitivity or insensitivity via locale...)

landley avatar Sep 13 '22 19:09 landley

there's been no reply

Yeah haven't had time to look into it, i've tried to use strcoll instead of strcasecmp but for some reason it still doesn't respect locale, not entirely sure why, but didn't dig too deep yet

toybox only implements the C locale

Could you elaborate on what you mean by that?

omertuc avatar Sep 13 '22 20:09 omertuc

LC_ALL=C is a synonym for the posix locale (modulo arguing about whether or not that includes utf8 support).

https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap07.html

Most libc functions don't care about locale. Support for locales is something you have to implement in your program, and toybox implemented the Posix locale behavior only. (There's a one sentence mention of this in https://landley.net/toybox/design.html that I should possibly expand a bit, but how many details do we need about NOT doing something? It's a design decision: locale support is out of scope for the project. Even the little internationalization support toybox tries to do, such as using strerror() to convert our error values into the local language, generallly aren't supported by android/bionic because their internationalization plumbing is in the GUI layer.)

In main.c we do fiddle a little with the locale plumbing to ensure unicode and utf-8 support is enabled (search for LC_CTYPE) for functions like towupper(), but otherwise toybox does not support changing locales.

Functions like strcoll() assume you've called setlocale() with appropriate values. None of our code calls nl_langinfo() to ask questions about our current locale when printing things like date formats, and we actually removed comma support from human_readable() in commit 5b7cc6d6c2a5 to avoid having to properly internationalize it.

landley avatar Sep 14 '22 03:09 landley

P.S. I'm saying changing sort behavior by setting an environment variable ain't gonna happen, but adding a command line option to tweak sort order seems reasonable? There's already "ls -ltr" and such. You could alias ls="ls --sort=nocase" as easily as "ls --color=auto" (which most distros seem to alias ls to by default these days).

landley avatar Sep 14 '22 03:09 landley

Functions like strcoll() assume you've called setlocale() with appropriate values

Ah ok. My assumption was that just setting the environment variable should be enough, didn't realize setlocale() has to be called. So that explains why I didn't see toybox strcoll() responding to my environment variable change

P.S. I'm saying changing sort behavior by setting an environment variable ain't gonna happen

If not respecting locale is a design goal, then that's fair. It's just a shame that the C locale has this unfortunate directory case sorting behavior - my goal with this PR was mostly to make the default ls experience in adb shell nicer and more consistent with "typical" Linux machines (which is probably where most people would encounter this sorting "problem"). I thought it was an actual bug but I'm now convinced it's by design, so I don't see much point in adding my "fix" or adding support for --sort=nocase, as 99% of users are not gonna do that / not going to be aware of it, so it has little value. (My perspective is narrow, just the typical adb shell user, maybe in other toybox contexts it's different and could be useful).

omertuc avatar Sep 14 '22 11:09 omertuc

but adding a command line option to tweak sort order seems reasonable?

agreed, it might get picked up in the future (POSIX 3.0?)

as 99% of users are not gonna do that / not going to be aware of it, so it has little value

I have some extremely crafted ls output, and I for one would use --sort=nocase, --sort=case and --sort=none .

paulwratt avatar Sep 15 '22 02:09 paulwratt

It's not about "respecting" locale: we didn't implement it.

Once toysh gets command editing and history and stuff, an /etc/profile or similar that does alias ls="ls --color" and friends seems reasonable even for adb? (Files like ~/.bashrc only gets sourced for interactive shells, sticking an "alias" there shouldn't affect non-interactive uses.)

You came to me with a use case, and I respect that and want to handle that use case. Just in a different way than you suggested.

landley avatar Sep 15 '22 13:09 landley

Once toysh gets command editing and history and stuff, an /etc/profile or similar that does alias ls="ls --color" and friends seems reasonable even for adb? (Files like ~/.bashrc only gets sourced for interactive shells, sticking an "alias" there shouldn't affect non-interactive uses.)

funnily enough, i already have a mkshrc change from an engineer at xiaomi to add that alias, but i don't like it because the usual debian "if you don't like the default, edit your copy of the file" doesn't apply on Android. and since i hate aliases in general but color ls in particular, that one's a particularly hard sell. i really don't know what the answer to that is though. there's clearly a problem (since reasonable people disagree on this color stuff in particular, but all kinds of aliases in general), but i've no idea what a reasonable solution for Android would actually look like. (my least worst ideas involve environment variables passed from adb on the host, but even there i'm worried about breaking test infrastructure if adb's/the shell's behavior depends on the host environment. because not only is "shell" the user that every human runs as, it's the user that most/all test infrastructure runs as too [with the alternative being "root", which isn't helpful].)

enh-google avatar Sep 15 '22 22:09 enh-google

You've heard my rant on how open source development structurally can't handle aesthetic issues: https://landley.net/notes-2010.html#13-08-2010

Successful projects have a BDFL make aesthetic decisions for them, and for the Android base OS that's basically you. For toybox, however... I can't say anything intelligent here because I don't know why "edit your copy" doesn't apply? (Context?)

If you want an environment variable... to be honest I still don't understand how BASH_ENV is NOT considered a security violation:

$ echo 'echo this cannot possibly be a bood idea' > potato $ echo -e '#!/bin/bash\necho sigh' > thingy $ chmod +x thingy $ BASH_ENV=$PWD/potato ./thingy this cannot possibly be a good idea sigh

But then LD_TRACE_LOADED_OBJECTS=1 is... very gnu.

You can also have an rc file source another one (out of "user mount space" or something), and "source /path/to/doesnotexist 2>/dev/null" is harmless when the file isn't there, but I dunno if developers have anything conceptually like a "home" directory on an android device they're doing development on, where they COULD symlink a file full of interactive preferences...

landley avatar Sep 16 '22 05:09 landley

FWIW part of the reason I dont use variables (in a situation like this) is because changes in child processes dont propagate, whereas a change in a file however does propagate, especially if you require (or forced into) a clean environment

paulwratt avatar Sep 16 '22 05:09 paulwratt

You've heard my rant on how open source development structurally can't handle aesthetic issues: https://landley.net/notes-2010.html#13-08-2010

Successful projects have a BDFL make aesthetic decisions for them, and for the Android base OS that's basically you. For toybox, however... I can't say anything intelligent here because I don't know why "edit your copy" doesn't apply? (Context?)

because the mkshrc ends up on a read-only partition. (sure, i could also read another file in /data/local/tmp/ but that would be reverted as soon as security found out.)

If you want an environment variable... to be honest I still don't understand how BASH_ENV is NOT considered a security violation:

heh, yeah, i don't want an environment variable; both reproducibility and security suffer from such things. so, yes, i'm surprised to learn that bash already has such a thing. (so if you implement that, please leave me a switch to disable that at build time!)

You can also have an rc file source another one (out of "user mount space" or something), and "source /path/to/doesnotexist 2>/dev/null" is harmless when the file isn't there, but I dunno if developers have anything conceptually like a "home" directory on an android device they're doing development on, where they COULD symlink a file full of interactive preferences...

yeah, that's the trouble ... there's only really "shell" and "root". (you can also "runas" in your app's context [if that app has debuggable=true, which can't be true for anything served on the Play Store], but i think it's only OS developers who're interested in this kind of thing.)

so far my advice is "edit your local copy of mkshrc". because funnily enough i do that myself --- over the years i've got so used to having a bold prompt that i rely on it quite heavily to see where commands start/end. but i don't feel like i can inflict that on everyone for similar reasons... it's bound to break someone's testing somewhere.

enh-google avatar Sep 16 '22 15:09 enh-google

I added "ls ---sort=nocase", does that address your issue?

I also overcomplicated things by making --sort take a csv list of fallback sorts to perform in order, and adding short options for each new sort type, and for some reason -~ isn't working so I'm tracking down why, and then I need to add test suite entries for all of it, but I thought I should at least mention this issue has beenn theoretically addressed if not quite in "can of worms is once again closed" state...

landley avatar Apr 05 '23 12:04 landley

Fixed the -~ thing. Still need to add --sort tests to the test suite.

landley avatar Apr 05 '23 12:04 landley

Adding the new ls tests got derailed by http://lists.landley.net/pipermail/toybox-landley.net/2023-May/029559.html which I've revisited a few times since and went "ew" at each time, and I think I just want to diverge from gnu here because there's cumulative stupid, but thought I'd ask for opinions first.

It mostly doesn't change scripts, because both -b and the new --gratuitous-longopt=even-longer behavior were A) added within the past few years, B) only apply to terminal output, so you can't even pipe them to less and sanely analyze the result. (I need to make a PTY pipe in toysh. Pipe the output with |! or some such and have the input be a pty emulating the terminal characteristics of /dev/tty of the pipeline context. Ahem: todo item for much later...)

landley avatar May 04 '23 08:05 landley

Try commit 0b20d799bc9a

landley avatar Jun 04 '23 15:06 landley

Assuming the commit I asked for testing on back in June fixed it.

landley avatar Dec 17 '23 09:12 landley