exa
exa copied to clipboard
Sorting is not locale based or unicode aware
So my locale is pt_BR.utf-8
, issuing ls
gives the following output, which sorts accents correctly:
$ ls -l
drwxr-xr-x 2 fernando fernando 4096 jun 26 17:02 'área de trabalho'
drwxr-xr-x 3 fernando fernando 4096 jun 26 15:28 documentários
drwxr-xr-x 2 fernando fernando 4096 jul 6 17:40 documentos
drwxr-xr-x 11 fernando fernando 4096 ago 28 09:29 downloads
...
However, exa
doesn't:
$ exa -l
drwxr-xr-x - fernando 6 jul 17:40 documentos
drwxr-xr-x - fernando 26 jun 15:28 documentários
drwxr-xr-x - fernando 28 ago 9:29 downloads
...
drwxr-xr-x - fernando 26 jun 17:02 área de trabalho
Another example is how ls
handles punctuation:
$ ls -l
-rw-r--r-- 1 fernando fernando 773 ago 21 00:30 Reactive-Extensions-Examples.md
-rw-r--r-- 1 fernando fernando 169 ago 18 14:27 _sidebar.md
-rw-r--r-- 1 fernando fernando 1425 ago 17 20:10 Summary-of-Simplicity-Matters.md
While exa
gives a different ordering:
$ exa -l
.rw-r--r-- 169 fernando 18 ago 14:27 _sidebar.md
.rw-r--r-- 773 fernando 21 ago 0:30 Reactive-Extensions-Examples.md
.rw-r--r-- 1,4k fernando 17 ago 20:10 Summary-of-Simplicity-Matters.md
Which is actually different from what I was expecting, as ls -v
sorts _
in the end:
$ ls -lv
-rw-r--r-- 1 fernando fernando 773 ago 21 00:30 Reactive-Extensions-Examples.md
-rw-r--r-- 1 fernando fernando 1425 ago 17 20:10 Summary-of-Simplicity-Matters.md
-rw-r--r-- 1 fernando fernando 169 ago 18 14:27 _sidebar.md
(it seems to me that exa
ignores case differently here; while ls
uppercases everything, exa
downcases everything).
There is no easy way to sort according to a locale in Rust. Sorting is simply handled by Natord.
I’d like to see some collation library in pure Rust or good bindings to ICU, but until then I don’t think we can do anything about it.
oof, as someone who also uses a non-C locale this unfortunately doesn't make exa
a suitable ls
replacement yet
ICU4X has been announced and could be the solution to this problem. It will probably be a long while until a production-ready version is released, though.
Note that this bug has been reported to Debian as well: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=950862
(I know this is a closed issue, but figured I'd toss a quick update in here for anyone looking through this later on.)
ICU4X 0.4 was released 2021-11-01, and the current roadmap and requirements doc project a 1.0 release around Q2 of 2022.
It’s not a closed issue at all. I would be thrilled to integrate ICU4X into exa, if it matches our needs. Unfortunately, it seems that there’s still no support for anything related to collation, so even if we use for other things (like datetime), this issue won’t be solved anytime soon, I’m afraid.
Edit: ah, I see that a Collator component is on their roadmap for 0.5, not sure when it’ll happen but yeah hopefully we can use that sometime in 2022 :crossed_fingers:
ICU4X 0.6 was released and 1.0 is in beta. The changelog doesn't mention collation but the source repository seem to include references to collation. Do you know if it's usable by now?
The ICU4X Collator component work keeps getting bumped and didn't make it into the 0.6 release. It's currently slated for 1.0, but the current 1.0-beta only includes a partial implementation, so we'll just have to see.