seq: Performance optimization
Reading #6182 I noticed that most of the time spent running cargo run seq 4e4000003 4e4000003 was just BigUint::to_string()
Reading the code I found it is being called twice, once to get the actual string representation of the first number, and again on the last number, but only to get its length; and discarding the actual result. This seems fine on fairly small numbers, but its efficiency degrades significantly on larger ones.
In my machine, this change resulted in a ~2x speedup on the mentioned case, and a marginal but seemingly better performance on smaller cases.
$ ~/coreutils$ hyperfine -L seq target/release/coreutils,target/release/coreutils_old "{seq} seq 4e4000003 4e4000003"
Benchmark 1: target/release/coreutils seq 4e4000003 4e4000003
Time (mean ± σ): 26.009 s ± 0.113 s [User: 25.992 s, System: 0.015 s]
Range (min … max): 25.909 s … 26.294 s 10 runs
Benchmark 2: target/release/coreutils_old seq 4e4000003 4e4000003
Time (mean ± σ): 52.372 s ± 0.446 s [User: 52.352 s, System: 0.017 s]
Range (min … max): 51.815 s … 53.017 s 10 runs
Summary
'target/release/coreutils seq 4e4000003 4e4000003' ran
2.01 ± 0.02 times faster than 'target/release/coreutils_old seq 4e4000003 4e4000003'
GNU testsuite comparison:
Skip an intermittent issue tests/misc/tee (fails in this run but passes in the 'main' branch)
Would be great to add your example to the benchmarks
Would be great to add your example to the benchmarks
In a separate pr please :)
Any reason for not using
n.checked_ilog10().unwrap_or(0) + 1
Seemed unnecessary given it is just a constant. If there is any benefit to this alternative I am happy to refactor.
CodSpeed Performance Report
Merging #9557 will improve performances by 58.87%
Comparing FidelSch:seq-optimization (7b79b27) with main (7c62885)
Summary
⚡ 1 improvement
✅ 126 untouched
⏩ 6 skipped[^skipped]
Benchmarks breakdown
| Benchmark | BASE |
HEAD |
Change | |
|---|---|---|---|---|
| ⚡ | seq_large_integers |
2.2 ms | 1.4 ms | +58.87% |
| [^skipped]: 6 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. |
any idea why tsort_input_parsing_heavy regressed ?
GNU testsuite comparison:
Skip an intermittent issue tests/tail/overlay-headers (fails in this run but passes in the 'main' branch)
Congrats! The gnu test tests/tty/tty-eof is no longer failing!
GNU testsuite comparison:
Skip an intermittent issue tests/tail/overlay-headers (fails in this run but passes in the 'main' branch)
Any reason for not using
n.checked_ilog10().unwrap_or(0) + 1Seemed unnecessary given it is just a constant. If there is any benefit to this alternative I am happy to refactor.
If I recall correctly the performance is similar to your solution and it's part of the language. Could you try sending a commit to trigger the benchmarks with this solution instead?
note, upstream fails this way - what do you think we should do here?
$ LANG=C /usr/bin/seq "4e4000003" "4e4000003"
seq: invalid floating point argument: '4e4000003'
Try '/usr/bin/seq --help' for more information.
After some digging, the maximum value seq accepts seems to be 11e4931, equivalent to the maximum value representable by an 80-bit long double; which supports the theory that GNU uses this type for their implementation.
By using BigDecimal we are significantly extending the representable range, so it makes sense that for huge numbers a direct comparison to seq is not viable.