Move ExtendedBigDecimal to uucore/format, make use of it in formatting functions
Okay, this one is a bit of a large PR... I can split it up if needed (e.g. everything up to uucode: format: Change Formatter to take an &ExtendedBigDecimal is fairly mechanical). Also, I'm not too worried about having to rewrite stuff if needed (one of my purpose to contribute is to learn about Rust ,-)), so please let me know ,-)
The idea here is to switch formatting functions to use ExtendedBigDecimal, instead of f64. This allows us to get arbitrary precision with custom format in seq:
cargo run seq -f "%.32f" 0.1 1 1
0.10000000000000000000000000000000
Note that that doesn't match what coreutils outputs on x86, since they use an 80-bit int:
0.10000000000000000000135525271561
But that's better than before this change, where we used a f64:
0.10000000000000000555111512312578
At least now we have all the digits, so we could find ways to trim the floating point precision back to match 80-bit float if we wanted to.
We also take it as an opportunity to (partially) fix the hexadecimal format:
cargo run seq -f "%.32a" 0.1 1 1
0xc.cccccccccccccccccccccccccccccccdp-7
Again, doesn't fully match what coreutils does, but we could trim the precision:
0xc.ccccccccccccccd00000000000000000p-7
Note that I'm using seq and not printf, as the parsing code still uses f64... There's also quite a bit of potential for this change on the parsing side: I think a lot of custom logic in seq could then be removed.
seq: Make use of uucore::format to print in all cases
Now that uucore format functions take in an ExtendedBigDecimal, we can use those in all cases.
uucore: format: Pad non-finite numbers with spaces, not zeros
printf "%05.2f" inf should print inf, not 00inf.
Add a test to cover that case, too.
uucode: format: format_float_hexadecimal: Take in &BigDecimal
Display hexadecimal floats with arbitrary precision.
Note that some of the logic will produce extremely large BitInt as intermediate values: there is some optimization possible here, but the current implementation appears to work fine for reasonable numbers (e.g. whatever would previously fit in a f64, and even with somewhat large precision).
uucode: format: format_float_shortest: Take in &BigDecimal
Similar logic to scientific printing. Also add a few more tests around corner cases where we switch from decimal to scientific printing.
uucode: format: format_float_scientific: Take in &BigDecimal
No more f64 operations needed, we just trim (or extend) BigDecimal to appropriate precision, get the digits as a string, then add the decimal point.
Similar to what BigDecimal::write_scientific_notation does, but we need a little bit more control.
uucode: format: format_float_decimal: Take in &BigDecimal
Also add a few unit tests to make sure precision is not lost anymore.
uucode: format: format_float_non_finite: Take in &ExtendedBigDecimal
First modify Format.fmt to extract absolute value and sign, then modify printing on non-finite values (inf or nan).
uucode: format: Change Formatter to take an &ExtendedBigDecimal
Only changes the external interface, right now the number is casted back to f64 for printing. We'll update that in follow-up.
uucore: format: extendedbigdecimal: Implement From
Allows easier conversion.
uucore: format: extendedbigdecimal: Add MinusNan
Some test cases require to handle "negative" NaN. Handle it similarly to "positive" NaN.
uucore: format: Make Formatter a generic
Using an associated type in Formatter trait was quite nice, but, in a follow-up change, we'd like to pass a reference to the Float Formatter, while just passing i64/u64 as a value to the Int formatters. Associated type doesn't allow for that, so we turn it into a generic instead.
This makes Format<> a bit more complicated though, as we need to specify both the Formatter, and the type to be formatted.
seq: Move extendedbigdecimal.rs to uucore/features/format
Will make it possible to directly print ExtendedBigDecimal in seq,
and gradually get rid of limited f64 precision in other tools
(e.g. printf).
Changes are mostly mechanical, we reexport ExtendedBigDecimal directly in format to keep the imports slightly shorter.
GNU testsuite comparison:
Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)
why do move it to uucore? it seems that it is only used b seq, no ?
I should have made this a bit more clear, this is just the first step in many.
why do move it to uucore? it seems that it is only used b seq, no ?
Correct, it was only used in seq, but now printf uses it too, for formatting only (no parsing yet). Also, in seq, I want to be able to directly format ExtendedBigDecimal when a format -f ... is specified on the command line (currently seq bypasses uucore/format for the default format, maybe we can simplify this already, I can look).
The eventual idea is to start using ExtendedBigDecimal in all floating number parsing and formatting (that's seq and printf yes). By using f64 currently, we lose digits compared to GNU coreutils implementations (which uses 64/80/128-bit floats depending on the native long double size), so we can't properly do hex float formatting in seq or printf (#7364), or actually, any format with high precision (e.g. %.24f).
The output is still different from GNU coreutils because:
printfusesuucore/formatparsing code that still returnsf64. Soprintf "%.24f" 0.1 => 0.100000000000000005551115while GNU coreutils gives us0.100000000000000000001355. We need to convert the parsing code to returnExtendedBigDecimalas well (we'd then get a clean0.100000000000000000000000, see next point)seqdoes parseExtendedBigDecimal, and now printsExtendedBigDecimaleven when a format is specified, so it does output numbers that are more precise that GNU coreutils on any platforms (but now that we have the precision, we could trim it down).
Hope this makes sense ,-)
Added 2 commits, to make full use of the new formatting function in seq:
uucore: format: Pad non-finite numbers with spaces, not zerosthis is somethingseqdid right, but notprintf(i.e.uucore/format), so it's nice that both utilities use the same printing function now.seq: Make use of uucore::format to print in all cases: That removes quite a bit of code that was, in some way, duplicated betweenuucoreandseq.
(spotted #7466... for another PR)
GNU testsuite comparison:
Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)
Did you look at performances ? :)
seq-main is main branch, ./target/release/seq is this branch.
GNU coreutils is much faster in most cases, so mostly focusing on main vs this branch:
4% worse than main on integers:
hyperfine "./target/release/seq 1000000" "./seq-main 1000000" "seq 1000000"
Benchmark 1: ./target/release/seq 1000000
Time (mean ± σ): 309.6 ms ± 4.9 ms [User: 252.6 ms, System: 56.4 ms]
Range (min … max): 303.7 ms … 318.0 ms 10 runs
Benchmark 2: ./seq-main 1000000
Time (mean ± σ): 297.0 ms ± 6.2 ms [User: 238.1 ms, System: 58.3 ms]
Range (min … max): 289.3 ms … 312.8 ms 10 runs
Benchmark 3: seq 1000000
Time (mean ± σ): 10.3 ms ± 2.1 ms [User: 9.5 ms, System: 0.8 ms]
Range (min … max): 5.0 ms … 13.6 ms 357 runs
Summary
seq 1000000 ran
28.77 ± 5.92 times faster than ./seq-main 1000000
29.99 ± 6.16 times faster than ./target/release/seq 1000000
~15% worse on floats with default format:
hyperfine "./target/release/seq 0 0.000001 1" "./seq-main 0 0.000001 1" "seq 0 0.000001 1"
Benchmark 1: ./target/release/seq 0 0.000001 1
Time (mean ± σ): 336.4 ms ± 4.4 ms [User: 277.1 ms, System: 58.5 ms]
Range (min … max): 332.2 ms … 347.4 ms 10 runs
Benchmark 2: ./seq-main 0 0.000001 1
Time (mean ± σ): 297.0 ms ± 2.0 ms [User: 233.3 ms, System: 63.1 ms]
Range (min … max): 294.2 ms … 300.9 ms 10 runs
Benchmark 3: seq 0 0.000001 1
Time (mean ± σ): 111.8 ms ± 3.2 ms [User: 111.3 ms, System: 0.5 ms]
Range (min … max): 108.4 ms … 119.5 ms 26 runs
Summary
seq 0 0.000001 1 ran
2.66 ± 0.08 times faster than ./seq-main 0 0.000001 1
3.01 ± 0.10 times faster than ./target/release/seq 0 0.000001 1
but 2+ times better on custom float formats, especially with large precision:
hyperfine "./target/release/seq -f'%.24f' 0 0.000001 1" "./seq-main -f'%.24f' 0 0.000001 1" "seq -f'%.24f' 0 0.000001 1"
Benchmark 1: ./target/release/seq -f'%.24f' 0 0.000001 1
Time (mean ± σ): 361.6 ms ± 8.9 ms [User: 303.7 ms, System: 57.1 ms]
Range (min … max): 353.9 ms … 383.0 ms 10 runs
Benchmark 2: ./seq-main -f'%.24f' 0 0.000001 1
Time (mean ± σ): 876.4 ms ± 2.4 ms [User: 808.3 ms, System: 66.2 ms]
Range (min … max): 874.2 ms … 881.8 ms 10 runs
Benchmark 3: seq -f'%.24f' 0 0.000001 1
Time (mean ± σ): 215.8 ms ± 5.1 ms [User: 214.8 ms, System: 0.6 ms]
Range (min … max): 209.4 ms … 224.2 ms 13 runs
Summary
seq -f'%.24f' 0 0.000001 1 ran
1.68 ± 0.06 times faster than ./target/release/seq -f'%.24f' 0 0.000001 1
4.06 ± 0.10 times faster than ./seq-main -f'%.24f' 0 0.000001 1
I could make seq 10% faster than main for integers by adding a bypass in the formatting code:
diff --git a/src/uucore/src/lib/features/format/num_format.rs b/src/uucore/src/lib/features/format/num_format.rs
index 3a22fe0442f9..83543f14c863 100644
--- a/src/uucore/src/lib/features/format/num_format.rs
+++ b/src/uucore/src/lib/features/format/num_format.rs
@@ -356,6 +356,9 @@ fn format_float_decimal(bd: &BigDecimal, precision: usize, force_decimal: ForceD
debug_assert!(!bd.is_negative());
if precision == 0 && force_decimal == ForceDecimal::Yes {
format!("{bd:.0}.")
+ } else if bd.is_integer() {
+ let (bi, _) = bd.as_bigint_and_scale();
+ bi.to_str_radix(10)
} else {
format!("{bd:.precision$}")
}
I suspect there's a lot of small optimizations we can do. But maybe in this case seq should detect we are dealing with integers and use i64 (or a BigInt)? We could also fall back to f64 when the precision doesn't matter (e.g., if you print a few digits only...).
I'll try to look a bit more at floats with default format...
For the second case, we can reduce the slowdown to 5% with a bypass in write_output when width == 0:
diff --git a/src/uucore/src/lib/features/format/num_format.rs b/src/uucore/src/lib/features/format/num_format.rs
index 3a22fe0442f9..d8d60c8ba87f 100644
--- a/src/uucore/src/lib/features/format/num_format.rs
+++ b/src/uucore/src/lib/features/format/num_format.rs
@@ -614,6 +619,11 @@ fn write_output(
width: usize,
alignment: NumberAlignment,
) -> std::io::Result<()> {
+ if width == 0 {
+ writer.write(sign_indicator.as_bytes())?;
+ writer.write(s.as_bytes())?;
+ return Ok(())
+ }
Happy to add those 2 small optimizations if needed.
yeah, probably better and don't hesitate to document your benchmark in the BENCHMARKING.md file
Done, also filed #7482 for the most egregious performance gap.
(also discovered that useful hyperfine syntax: hyperfine -L seq seq,target/release/seq "{seq} 1000000")
GNU testsuite comparison:
Skip an intermittent issue tests/misc/stdbuf (fails in this run but passes in the 'main' branch)
GNU test failed: tests/seq/seq. tests/seq/seq is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/seq/seq-extra-number. tests/seq/seq-extra-number is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/seq/seq-precision. tests/seq/seq-precision is passing on 'main'. Maybe you have to rebase?
Congrats! The gnu test tests/misc/tee is no longer failing!
Forgot to run fmt/clippy... and didn't properly run tests... Fixed now.
Just for reference, benchmarks vs main: 25% better on integers, 5% slower on default floats:
$ taskset -c 0 hyperfine -L seq ./seq-main,target/release/seq "{seq} 1000000"
Benchmark 1: ./seq-main 1000000
Time (mean ± σ): 294.2 ms ± 5.5 ms [User: 231.2 ms, System: 61.0 ms]
Range (min … max): 288.9 ms … 305.3 ms 10 runs
Benchmark 2: target/release/seq 1000000
Time (mean ± σ): 235.1 ms ± 0.8 ms [User: 180.8 ms, System: 53.8 ms]
Range (min … max): 233.8 ms … 236.4 ms 12 runs
Summary
target/release/seq 1000000 ran
1.25 ± 0.02 times faster than ./seq-main 1000000
$ taskset -c 0 hyperfine -L seq ./seq-main,target/release/seq "{seq} 0 0.000001 1"
Benchmark 1: ./seq-main 0 0.000001 1
Time (mean ± σ): 294.5 ms ± 0.5 ms [User: 231.3 ms, System: 62.5 ms]
Range (min … max): 293.8 ms … 295.8 ms 10 runs
Benchmark 2: target/release/seq 0 0.000001 1
Time (mean ± σ): 310.0 ms ± 5.0 ms [User: 248.8 ms, System: 59.2 ms]
Range (min … max): 307.4 ms … 324.2 ms 10 runs
Summary
./seq-main 0 0.000001 1 ran
1.05 ± 0.02 times faster than target/release/seq 0 0.000001 1
GNU testsuite comparison:
Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)
(I don't think those 2 failures are caused by this PR)
GNU testsuite comparison:
GNU test failed: timeout/timeout.log. timeout/timeout.log is passing on 'main'. Maybe you have to rebase?
Congrats! The gnu test misc/stdbuf.log is no longer failing!
Just noticed that this PR regressed uppercase hexadecimal float printing, so I added a commit here to fix that: cd30627b04361732bb72e74e3e9506b9e0e61b1a , and add a test so that it doesn't happen again. Rebased, too.
I also opened #7514 to fix one more small issue, and add a bunch more tests (didn't want to push too much new stuff here).
GNU testsuite comparison:
Congrats! The gnu test timeout/timeout.log is no longer failing!
GNU testsuite comparison:
Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)