coreutils
coreutils copied to clipboard
sort: -S/--buffer-size does not support memory percentages %
GNU sort v. Rust sort:
$ /usr/bin/sort --buffer-size=50% < /dev/null
$ target/release/coreutils sort --buffer-size=50% < /dev/null
sort: invalid --buffer-size argument '50%'
Relevant excerpts from GNU sort manpage:
-S, --buffer-size=SIZE
use SIZE for main memory buffer
[...]
SIZE may be followed by the following multiplicative suffixes: % 1% of memory, b 1, K 1024 (default), and so on for M, G, T, P, E, Z, Y.
(this could be a good first issue)
There is an even bigger problem here, which is that sort
precomputes data for each line that will be sorted (this makes comparing them faster). Due to that the exact amount of memory usage depends on the number of lines in a chunk (shorter lines -> more lines in a chunk -> more memory usage).
Right now, when you specify a buffer size, the actual memory usage will be less or more than that. I think we should first solve that problem.
@miDeb I wasn't aware of that issue, but it looks to me the two can be fixed independently from one another.
I will take a look at this one (only the percentage parsing for now)
Is this open for contribution?
Sure, go ahead!
Is it ok to use sysinfo crate to get memory size?
@zeroishero I think we already use that crate in some utils, so should be fine!
Ok, then. I can probably do this. I will start it soon.
@tertsdiepraam I don't see sysinfo being used. I meant this crate: https://crates.io/crates/sysinfo
Hmmm, I was sure we were using that somewhere, but maybe that was some other repo. If it's hard to get the info some other way, it might be fine to use that. You can make a PR with that first and then we'll figure out what to do. The crate is by Guillaume Gomez, so I definitely trust it 😄