lsd
lsd copied to clipboard
thousands separator for the --size=bytes option would be very useful
- OS: Linux 5.12.9-1-MANJARO x86_64 GNU/Linux
lsd --version: lsd 0.20.1echo $TERM: xterm-256colorecho $LS_COLORS: rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:.tar=01;31:.tgz=01;31:.arc=01;31:.arj=01;31:.taz=01;31:.lha=01;31:.lz4=01;31:.lzh=01;31:.lzma=01;31:.tlz=01;31:.txz=01;31:.tzo=01;31:.t7z=01;31:.zip=01;31:.z=01;31:.dz=01;31:.gz=01;31:.lrz=01;31:.lz=01;31:.lzo=01;31:.xz=01;31:.zst=01;31:.tzst=01;31:.bz2=01;31:.bz=01;31:.tbz=01;31:.tbz2=01;31:.tz=01;31:.deb=01;31:.rpm=01;31:.jar=01;31:.war=01;31:.ear=01;31:.sar=01;31:.rar=01;31:.alz=01;31:.ace=01;31:.zoo=01;31:.cpio=01;31:.7z=01;31:.rz=01;31:.cab=01;31:.wim=01;31:.swm=01;31:.dwm=01;31:.esd=01;31:.jpg=01;35:.jpeg=01;35:.mjpg=01;35:.mjpeg=01;35:.gif=01;35:.bmp=01;35:.pbm=01;35:.pgm=01;35:.ppm=01;35:.tga=01;35:.xbm=01;35:.xpm=01;35:.tif=01;35:.tiff=01;35:.png=01;35:.svg=01;35:.svgz=01;35:.mng=01;35:.pcx=01;35:.mov=01;35:.mpg=01;35:.mpeg=01;35:.m2v=01;35:.mkv=01;35:.webm=01;35:.webp=01;35:.ogm=01;35:.mp4=01;35:.m4v=01;35:.mp4v=01;35:.vob=01;35:.qt=01;35:.nuv=01;35:.wmv=01;35:.asf=01;35:.rm=01;35:.rmvb=01;35:.flc=01;35:.avi=01;35:.fli=01;35:.flv=01;35:.gl=01;35:.dl=01;35:.xcf=01;35:.xwd=01;35:.yuv=01;35:.cgm=01;35:.emf=01;35:.ogv=01;35:.ogx=01;35:.aac=00;36:.au=00;36:.flac=00;36:.m4a=00;36:.mid=00;36:.midi=00;36:.mka=00;36:.mp3=00;36:.mpc=00;36:.ogg=00;36:.ra=00;36:.wav=00;36:.oga=00;36:.opus=00;36:.spx=00;36:.xspf=00;36:
Expected behavior:
It is very hard to quickly interpret/recognize the real file/directory sizes when the --size=bytes option is given:
>lsd --size=bytes --sort=size --reverse`
.rw-r--r-- p p 1 Mon Jun 28 19:12:04 2021 file_1.dat
.rw-r--r-- p p 12 Mon Jun 28 19:12:04 2021 file_2.dat
.rw-r--r-- p p 123 Mon Jun 28 19:12:04 2021 file_3.dat
.rw-r--r-- p p 1234 Mon Jun 28 19:12:04 2021 file_4.dat
.rw-r--r-- p p 12345 Mon Jun 28 19:12:04 2021 file_5.dat
.rw-r--r-- p p 123456 Mon Jun 28 19:12:04 2021 file_6.dat
.rw-r--r-- p p 1234567 Mon Jun 28 19:12:04 2021 file_7.dat
.rw-r--r-- p p 12345678 Mon Jun 28 19:12:04 2021 file_8.dat
.rw-r--r-- p p 123456789 Mon Jun 28 19:12:04 2021 file_9.dat
.rw-r--r-- p p 1234567890 Mon Jun 28 19:12:06 2021 file_10.dat
However the other/similar tool called exa ( https://github.com/ogham/exa ) includes the thousands separator by default:
>exa --bytes --long --sort=size`
.rw-r--r-- 1 p 28 Jun 19:12 file_1.dat
.rw-r--r-- 12 p 28 Jun 19:12 file_2.dat
.rw-r--r-- 123 p 28 Jun 19:12 file_3.dat
.rw-r--r-- 1,234 p 28 Jun 19:12 file_4.dat
.rw-r--r-- 12,345 p 28 Jun 19:12 file_5.dat
.rw-r--r-- 123,456 p 28 Jun 19:12 file_6.dat
.rw-r--r-- 1,234,567 p 28 Jun 19:12 file_7.dat
.rw-r--r-- 12,345,678 p 28 Jun 19:12 file_8.dat
.rw-r--r-- 123,456,789 p 28 Jun 19:12 file_9.dat
.rw-r--r-- 1,234,567,890 p 28 Jun 19:12 file_10.dat
Actual behavior
Extra cognitive load without those thousands separators :(
This might not be a good idea. This will cause issues for people who might be using lsd in a script and grepping for the size part. I don't think breaking compatibility with gnu ls here would be a good idea.
Well, I really wanted to describe what to achieve, not how to achieve.
Also I agree, a previous ticket was by someone who used awk to parse lsd's output and due to space (or other separators) the parsing has failed: https://github.com/Peltoche/lsd/issues/254#issuecomment-517011212
But there is a very easy way out, which solves all aspect of the problem:
- do not (ever) add thousands separator when the --size=bytes option is used
- only add thousands separator when a new suboption is used, ie the --size=bytes_with_thousands_separator option is used for example
I hope it clears :)
Just wondering what a good option name would be? 🤔 bytes_with_thousands_separator is a bit too long. Or maybe even a separate option like --num-separators which someone can set to on,off,auto and auto will disable if we detect a pipe?
It is perfectly up to you and up to the project owners, other contributors, etc. how to do it.
For me even the --size=fancy_bytes works :)
I would vote for a separated flag --num-separators, as we could apply the separator to B, MB, GB, and even UNIX timestamp may be an option to be applied.
Not sure if it will be useful in MB/GB etc as that will break off to next unit at around thousand. As for UNIX timestamp, I don't think comma in a timestamp looks natural. Nobody really reads a timestamp.
Oh, my bad, I did not notice that there is no MB or GB option for size.
also, it makes me a little bit awkward leaving me the only one reading timestamp😅.
but as the --num-separators option would only affect the byte-size, it seems that an opinion for --size might be reasonable.
Localization might have to be considered here as well, as some countries use dots for separating thousands. Not sure if that's a real problem though.
Hi, I was thinking about this issue and I've two questions:
- System specific localization - there is num_format library which could provide us with system specific formatting, unfortunately for Windows it requires Clang. Is that a problem? Could Windows build be adjusted to deal with that?
- Flags discussion - personally I'm more into adding option for
--sizeflag, with namebytes_with_separators, are there any objections?
The solution you bring up actually sound pretty good. Also the word thousands does not make sense anyway. I forgot that in my country we actually separate by hundreds after the first set 😂. bytes-with-separator seems to be good flag.
That said, I am not a big fan of adding clang as a dependency and that too just for Windows. None of the maintainers as far as I know use Windows and adding more brittleness to that platform is probably gonna make things worse.