coreutils
coreutils copied to clipboard
`seq 4e4000003 4e4000003` is causing an infinite loop
Looking at seq oss fuzz coverage, I noticed that it wasn't producing much.
Writing a fuzz on seq parse number, i noticed that:
cargo run seq 4e4000003 4e4000003
is running forever
while GNU is doing:
$ LANG=C /usr/bin/seq 4e4000003 4e4000003
/usr/bin/seq: invalid floating point argument: '4e4000003'
Identified with https://github.com/uutils/coreutils/pull/6183
It appears that this might be an issue with the BigDecimal crate.
When trying to parse the argument, it will convert the string 4e4000003
into a string with a four followed by 4000003 zeroes.
It will never make it past numbers[0].parse
yes, it is ;) you should write a test case and report a bug in the crate
But it seems like uutils/coreutils does the string conversion. For some reason, we're turning 4e4000003
into a string with a four followed by 4000003 zeroes? The issue still ends up at the BigDecimal crate not being able to handle that string. But that string we generate is still about 4MB in memory for parsing a number. Seems excessive
yeah, maybe we are doing something wrong but bigdecimal might want to reject it directly
@maxer137 did you have a chance to look into this a bit more?
I tried running this command
$ time cargo run seq 4e4000003 4e4000003 > out.txt
Compiling coreutils v0.0.26 (/home/carbrex/uutils/coreutils)
Finished dev [unoptimized + debuginfo] target(s) in 6.31s
Running `target/debug/coreutils seq 4e4000003 4e4000003`
real 15.59s
user 14.28s
sys 1.25s
cpu 99%
This is the output. So it isn't specifically an infinite loop but takes much time to run, still a deviation from gnu though.
Upon further investigation I found that seq supports floating point only upto f128(128 bit floating point number). For example, seq 11e4931 11e4931
works but seq 12e4931 12e4931
throws an error.
Indeed. Using BigDecimal we are able to go up to much larger values than originally in GNU seq.
I have removed the zero padding from parse_decimal_and_exponent
and parse_decimal_no_exponent
in #6185
This seems to still allow very large numbers such as 4e4000003
to work but it will be very slow.
This seems to be due to us comparing two very large BigDecimal numbers.
Looking at the bigdecimal crate they are aware of this as shown in this issue
We could decide to either parse the value into an f128 value and then either reject or accept depending on if it is a valid value, or we accept that the uutils implementation can go above the ranges of GNU’s implementation. This would deviate from GNU, but I feel like there is a case to be made for allowing to extend the range seq supports.