coreutils icon indicating copy to clipboard operation
coreutils copied to clipboard

sort: -S/--buffer-size does not support memory percentages %

Open zacchiro opened this issue 2 years ago • 10 comments

GNU sort v. Rust sort:

$ /usr/bin/sort --buffer-size=50% < /dev/null
$ target/release/coreutils sort --buffer-size=50% < /dev/null
sort: invalid --buffer-size argument '50%'

Relevant excerpts from GNU sort manpage:

       -S, --buffer-size=SIZE
              use SIZE for main memory buffer

[...]

       SIZE may be followed by the following multiplicative suffixes: % 1% of memory, b 1, K 1024 (default), and so on for M, G, T, P, E, Z, Y.

(this could be a good first issue)

zacchiro avatar May 06 '22 19:05 zacchiro

There is an even bigger problem here, which is that sort precomputes data for each line that will be sorted (this makes comparing them faster). Due to that the exact amount of memory usage depends on the number of lines in a chunk (shorter lines -> more lines in a chunk -> more memory usage). Right now, when you specify a buffer size, the actual memory usage will be less or more than that. I think we should first solve that problem.

miDeb avatar May 06 '22 20:05 miDeb

@miDeb I wasn't aware of that issue, but it looks to me the two can be fixed independently from one another.

zacchiro avatar May 07 '22 07:05 zacchiro

I will take a look at this one (only the percentage parsing for now)

sallto avatar Oct 11 '22 10:10 sallto

Is this open for contribution?

zeroishero avatar Nov 06 '23 23:11 zeroishero

Sure, go ahead!

tertsdiepraam avatar Nov 07 '23 08:11 tertsdiepraam

Is it ok to use sysinfo crate to get memory size?

zeroishero avatar Nov 07 '23 17:11 zeroishero

@zeroishero I think we already use that crate in some utils, so should be fine!

tertsdiepraam avatar Nov 07 '23 18:11 tertsdiepraam

Ok, then. I can probably do this. I will start it soon.

zeroishero avatar Nov 07 '23 18:11 zeroishero

@tertsdiepraam I don't see sysinfo being used. I meant this crate: https://crates.io/crates/sysinfo

zeroishero avatar Nov 08 '23 01:11 zeroishero

Hmmm, I was sure we were using that somewhere, but maybe that was some other repo. If it's hard to get the info some other way, it might be fine to use that. You can make a PR with that first and then we'll figure out what to do. The crate is by Guillaume Gomez, so I definitely trust it 😄

tertsdiepraam avatar Nov 09 '23 08:11 tertsdiepraam