coreutils icon indicating copy to clipboard operation
coreutils copied to clipboard

printf: trim leading whitespace when parsing numeric values

Open dlrobertson opened this issue 8 months ago • 4 comments

Trim leading whitespace from numeric input to printf.

This is what seq does here. Note that the rust docs for trim_start state the following:

‘Whitespace’ is defined according to the terms of the Unicode Derived Core Property White_Space, which includes newlines.

So technically the following assert would pass

  assert_eq!(Ok(-2), ParsedNumber::parse_i64("\n\t -0x2"));

It looks like this does not currently impact printf or seq, but is probably something to be aware of.

$ cargo run printf "%f\n" " \n 0x1"
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.11s
     Running `target/debug/coreutils printf '%f\n' ' \n 0x1'`
printf: '\ \\n\ 0x1': expected a numeric value
0.000000

Fixes: #7505

dlrobertson avatar Mar 20 '25 18:03 dlrobertson

  1. That change should actually change printf behaviour (seq currently uses totally different parsing code, I'm trying to unify them...). Not sure why you don't see any change, but you really want to test that cargo run printf "%f\n" " 0x0" works. And maybe add an end-to-end test in test_printf.rs too.

  2. You really only want to trim spaces only, not other characters. At least that's what GNU coreutils seems to do: (I didn't test other unicode spaces ,-P):

$ env printf "%f\n" "\t0x1"
printf: ‘\\t0x1’: expected a numeric value
0.000000
$ env printf "%f\n" "\n0x1"
printf: ‘\\n0x1’: expected a numeric value
0.000000
$ env printf "%f\n" " 0x1"
1.000000

drinkcat avatar Mar 20 '25 19:03 drinkcat

  1. That change should actually change printf behaviour (seq currently uses totally different parsing code, I'm trying to unify them...). Not sure why you don't see any change, but you really want to test that cargo run printf "%f\n" " 0x0" works.

Sorry, my PR text was unclear. This PR does change printf behavior and fixes the issue, but I just wanted to make sure reviewers were aware that technically ParsedNumber::parse_<some integer> would trim all whitespace. So there is a semantic question there of is rejecting newlines the responsibility of parse_<some integer> or the responsibility of the caller etc? printf already throws errors for other whitespace like \n and \t.

And maybe add an end-to-end test in test_printf.rs too.

Yup! Good suggestion.

  1. You really only want to trim spaces only, not other characters. At least that's what GNU coreutils seems to do: (I didn't test other unicode spaces ,-P):
$ env printf "%f\n" "\t0x1"
printf: ‘\\t0x1’: expected a numeric value
0.000000
$ env printf "%f\n" "\n0x1"
printf: ‘\\n0x1’: expected a numeric value
0.000000
$ env printf "%f\n" " 0x1"
1.000000

👍 awesome. The current patch ends up with the same output, but there is still the question of should ParsedNumber::parse also reject non ascii 0x20 whitespace.

dlrobertson avatar Mar 20 '25 19:03 dlrobertson

  1. That change should actually change printf behaviour (seq currently uses totally different parsing code, I'm trying to unify them...). Not sure why you don't see any change, but you really want to test that cargo run printf "%f\n" " 0x0" works.

Sorry, my PR text was unclear. This PR does change printf behavior and fixes the issue, but I just wanted to make sure reviewers were aware that technically ParsedNumber::parse_<some integer> would trim all whitespace. So there is a semantic question there of is rejecting newlines the responsibility of parse_<some integer> or the responsibility of the caller etc? printf already throws errors for other whitespace like \n and \t.

Oh! I misunderstood yes ,-) Do you know what code is rejecting non-whitespace leading characters in printf? It's not super obvious from a very quick glance I had...

I feel I'd rather have the logic in the common parse function, seq will soon use that too, and I think we'll want to convert timeout to use the common parsing code too (#7475)... But if the rejection happens in some other shared code, maybe we can argue.

And maybe add an end-to-end test in test_printf.rs too.

Yup! Good suggestion.

  1. You really only want to trim spaces only, not other characters. At least that's what GNU coreutils seems to do: (I didn't test other unicode spaces ,-P):
$ env printf "%f\n" "\t0x1"
printf: ‘\\t0x1’: expected a numeric value
0.000000
$ env printf "%f\n" "\n0x1"
printf: ‘\\n0x1’: expected a numeric value
0.000000
$ env printf "%f\n" " 0x1"
1.000000

👍 awesome. The current patch ends up with the same output, but there is still the question of should ParsedNumber::parse also reject non ascii 0x20 whitespace.

Great ,-)

drinkcat avatar Mar 20 '25 20:03 drinkcat

Highly surprising, but seq and printf should actually both ignore all real whitespace :exploding_head:

Did some digging and ran some examples with GNU coreutils:

$ seq 1 " $(printf '\n\t 0x2')"
1
2
$ printf "%f\n" "$(printf '\n\t 0x2')"
2.000000

I've learned something new :laughing: I did not expect this to work.

Escape sequences are another story:

$ printf "%f\n" "\t\n 0x5"
-bash: printf: \t\n 0x5: invalid number
0.000000
$ seq 1 "\t\n 0x2"
seq: invalid floating point argument: ‘\\t\\n 0x2’
Try 'seq --help' for more information.

So I'll update the tests with some additions that cover more real whitespace types and add some tests to test_printf.rs.

dlrobertson avatar Mar 20 '25 20:03 dlrobertson

GNU testsuite comparison:

GNU test failed: timeout/timeout.log. timeout/timeout.log is passing on 'main'. Maybe you have to rebase?

github-actions[bot] avatar Mar 21 '25 06:03 github-actions[bot]

GNU testsuite comparison:

GNU test failed: misc/stdbuf.log. misc/stdbuf.log is passing on 'main'. Maybe you have to rebase?
GNU test failed: timeout/timeout.log. timeout/timeout.log is passing on 'main'. Maybe you have to rebase?

github-actions[bot] avatar Mar 21 '25 12:03 github-actions[bot]

Don't forget to run cargo fmt and cargo clippy.

Otherwise, LGTM!

(turns out I'll need this one to pass seq tests with the common parsing methods!)

drinkcat avatar Mar 21 '25 20:03 drinkcat

GNU testsuite comparison:

Congrats! The gnu test tests/tail/inotify-dir-recreate is no longer failing!

github-actions[bot] avatar Mar 24 '25 17:03 github-actions[bot]