coreutils icon indicating copy to clipboard operation
coreutils copied to clipboard

`seq 4e4000003 4e4000003` is causing an infinite loop

Open sylvestre opened this issue 10 months ago • 9 comments

Looking at seq oss fuzz coverage, I noticed that it wasn't producing much. Writing a fuzz on seq parse number, i noticed that: cargo run seq 4e4000003 4e4000003 is running forever

while GNU is doing:

$ LANG=C /usr/bin/seq 4e4000003 4e4000003
/usr/bin/seq: invalid floating point argument: '4e4000003'

sylvestre avatar Apr 02 '24 20:04 sylvestre

Identified with https://github.com/uutils/coreutils/pull/6183

sylvestre avatar Apr 02 '24 21:04 sylvestre

It appears that this might be an issue with the BigDecimal crate. When trying to parse the argument, it will convert the string 4e4000003 into a string with a four followed by 4000003 zeroes. It will never make it past numbers[0].parse

maxer137 avatar Apr 03 '24 12:04 maxer137

yes, it is ;) you should write a test case and report a bug in the crate

sylvestre avatar Apr 03 '24 12:04 sylvestre

But it seems like uutils/coreutils does the string conversion. For some reason, we're turning 4e4000003 into a string with a four followed by 4000003 zeroes? The issue still ends up at the BigDecimal crate not being able to handle that string. But that string we generate is still about 4MB in memory for parsing a number. Seems excessive

maxer137 avatar Apr 03 '24 13:04 maxer137

yeah, maybe we are doing something wrong but bigdecimal might want to reject it directly

sylvestre avatar Apr 03 '24 13:04 sylvestre

@maxer137 did you have a chance to look into this a bit more?

sylvestre avatar Apr 11 '24 07:04 sylvestre

I tried running this command

$ time cargo run seq 4e4000003 4e4000003 > out.txt
   Compiling coreutils v0.0.26 (/home/carbrex/uutils/coreutils)
    Finished dev [unoptimized + debuginfo] target(s) in 6.31s
     Running `target/debug/coreutils seq 4e4000003 4e4000003`

real    15.59s
user    14.28s
sys     1.25s
cpu     99%

This is the output. So it isn't specifically an infinite loop but takes much time to run, still a deviation from gnu though.

Carbrex avatar Apr 11 '24 07:04 Carbrex

Upon further investigation I found that seq supports floating point only upto f128(128 bit floating point number). For example, seq 11e4931 11e4931 works but seq 12e4931 12e4931 throws an error.

Carbrex avatar Apr 11 '24 08:04 Carbrex

Indeed. Using BigDecimal we are able to go up to much larger values than originally in GNU seq.

I have removed the zero padding from parse_decimal_and_exponent and parse_decimal_no_exponent in #6185

This seems to still allow very large numbers such as 4e4000003 to work but it will be very slow. This seems to be due to us comparing two very large BigDecimal numbers. Looking at the bigdecimal crate they are aware of this as shown in this issue

We could decide to either parse the value into an f128 value and then either reject or accept depending on if it is a valid value, or we accept that the uutils implementation can go above the ranges of GNU’s implementation. This would deviate from GNU, but I feel like there is a case to be made for allowing to extend the range seq supports.

maxer137 avatar Apr 11 '24 09:04 maxer137