coreutils icon indicating copy to clipboard operation
coreutils copied to clipboard

seq: Performance optimization

Open FidelSch opened this issue 3 weeks ago • 9 comments

Reading #6182 I noticed that most of the time spent running cargo run seq 4e4000003 4e4000003 was just BigUint::to_string()

Reading the code I found it is being called twice, once to get the actual string representation of the first number, and again on the last number, but only to get its length; and discarding the actual result. This seems fine on fairly small numbers, but its efficiency degrades significantly on larger ones.

In my machine, this change resulted in a ~2x speedup on the mentioned case, and a marginal but seemingly better performance on smaller cases.

$ ~/coreutils$ hyperfine -L seq target/release/coreutils,target/release/coreutils_old "{seq} seq 4e4000003 4e4000003" 
Benchmark 1: target/release/coreutils seq 4e4000003 4e4000003
  Time (mean ± σ):     26.009 s ±  0.113 s    [User: 25.992 s, System: 0.015 s]
  Range (min … max):   25.909 s … 26.294 s    10 runs
 
Benchmark 2: target/release/coreutils_old seq 4e4000003 4e4000003
  Time (mean ± σ):     52.372 s ±  0.446 s    [User: 52.352 s, System: 0.017 s]
  Range (min … max):   51.815 s … 53.017 s    10 runs
 
Summary
  'target/release/coreutils seq 4e4000003 4e4000003' ran
    2.01 ± 0.02 times faster than 'target/release/coreutils_old seq 4e4000003 4e4000003'

FidelSch avatar Dec 03 '25 20:12 FidelSch

GNU testsuite comparison:

Skip an intermittent issue tests/misc/tee (fails in this run but passes in the 'main' branch)

github-actions[bot] avatar Dec 03 '25 20:12 github-actions[bot]

Would be great to add your example to the benchmarks

ChrisDryden avatar Dec 04 '25 01:12 ChrisDryden

Would be great to add your example to the benchmarks

In a separate pr please :)

sylvestre avatar Dec 04 '25 06:12 sylvestre

Any reason for not using

n.checked_ilog10().unwrap_or(0) + 1

Seemed unnecessary given it is just a constant. If there is any benefit to this alternative I am happy to refactor.

FidelSch avatar Dec 04 '25 12:12 FidelSch

CodSpeed Performance Report

Merging #9557 will improve performances by 58.87%

Comparing FidelSch:seq-optimization (7b79b27) with main (7c62885)

Summary

⚡ 1 improvement
✅ 126 untouched
⏩ 6 skipped[^skipped]

Benchmarks breakdown

Benchmark BASE HEAD Change
seq_large_integers 2.2 ms 1.4 ms +58.87%
[^skipped]: 6 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

codspeed-hq[bot] avatar Dec 06 '25 15:12 codspeed-hq[bot]

any idea why tsort_input_parsing_heavy regressed ?

sylvestre avatar Dec 06 '25 15:12 sylvestre

GNU testsuite comparison:

Skip an intermittent issue tests/tail/overlay-headers (fails in this run but passes in the 'main' branch)
Congrats! The gnu test tests/tty/tty-eof is no longer failing!

github-actions[bot] avatar Dec 06 '25 15:12 github-actions[bot]

GNU testsuite comparison:

Skip an intermittent issue tests/tail/overlay-headers (fails in this run but passes in the 'main' branch)

github-actions[bot] avatar Dec 06 '25 17:12 github-actions[bot]

Any reason for not using

n.checked_ilog10().unwrap_or(0) + 1

Seemed unnecessary given it is just a constant. If there is any benefit to this alternative I am happy to refactor.

If I recall correctly the performance is similar to your solution and it's part of the language. Could you try sending a commit to trigger the benchmarks with this solution instead?

anastygnome avatar Dec 07 '25 13:12 anastygnome

note, upstream fails this way - what do you think we should do here?

$ LANG=C /usr/bin/seq "4e4000003" "4e4000003"
seq: invalid floating point argument: '4e4000003'
Try '/usr/bin/seq --help' for more information.

sylvestre avatar Dec 14 '25 09:12 sylvestre

After some digging, the maximum value seq accepts seems to be 11e4931, equivalent to the maximum value representable by an 80-bit long double; which supports the theory that GNU uses this type for their implementation. By using BigDecimal we are significantly extending the representable range, so it makes sense that for huge numbers a direct comparison to seq is not viable.

FidelSch avatar Dec 15 '25 12:12 FidelSch