coreutils icon indicating copy to clipboard operation
coreutils copied to clipboard

printf compatibility

Open samueltardieu opened this issue 5 months ago • 2 comments

This makes printf more compatible with its GNU coreutils counterpart.

  • format: %c prints the first character of a string
  • format: parse as many characters as possible in numbers and make the error messages more compatible with GNU coreutils

samueltardieu avatar Jan 04 '24 20:01 samueltardieu

About the organization

~The split between the printf application and core functionality seems weird sometimes, as error messages are emitted from the core library. Maybe it would make sense if the parsing functionalities were either brought back into the printf application or returned Result objects in order to be able to format error messages in the printf application specifically.~

~For example, printf uses ‘XXX’ quotes which AFAICT were not present in the core library; emitting the message from there (uucore::features::format::argument) forces to use those quotes in uucore.~

Edit: as noted by @tertsdiepraam the quote issue was an oversight on my part

About fuzzing

~The fuzzing now often fails because too many file descriptors are opened. I don't know if there is a descriptor leak in the fuzzing framework itself, or if this is due to the launch of the external GNU coreutils commands.~

Edit: leak fixed in #5787

samueltardieu avatar Jan 04 '24 22:01 samueltardieu

Re: quotes. These probably depend on locale. We chose to follow the C locale until we have proper locale support, which uses '.

tertsdiepraam avatar Jan 04 '24 23:01 tertsdiepraam

Re: quotes. These probably depend on locale. We chose to follow the C locale until we have proper locale support, which uses '.

Interesting. I only had set LC_NUMERIC=C, but by setting all I get different results in the fuzzer but not in the tests. I'll fix the quotes.

samueltardieu avatar Jan 05 '24 08:01 samueltardieu

Fixed quotes

samueltardieu avatar Jan 05 '24 08:01 samueltardieu

GNU testsuite comparison:

Skip an intermittent issue tests/tail/inotify-dir-recreate (fails in this run but passes in the 'main' branch)

github-actions[bot] avatar Jan 05 '24 08:01 github-actions[bot]

I've restructured the code to make names more explicit, and added uucode tests to the partial parsing function.

samueltardieu avatar Jan 05 '24 09:01 samueltardieu

Hi! Thanks! I think this is mostly correct, but I think it should be implemented slightly differently. A problem with this implementation is that the base is lost. Here's GNU:

> printf %d 010b
printf: ‘010b’: value not completely converted
8

TIL I learned that coreutils inherited the "octal prefix mistake" from C.

So I think this is what should happen:

1. Remove the base prefix to determine the base.
2. Get the digits from the front.
3. If anything is left, print the error.
4. Convert the digits with `from_str_radix`.

Note that:

  • The "-" prefix must be extracted first as it happens earlier.
  • It has to account that some bases (dec/hex) can have a fractional part while others (bin/oct) cannot.

The code I implemented only deals with decimal numbers, as those are more complicated to parse (only one "-" optional prefix allowed, only one optional "." allowed, "-inf", "inf", "nan" — although those three aren't tied to any specific base they don't accept a base prefix).

This will also prevent us from trying to parse the same value multiple times, which feels a bit strange and requires us to keep both passes in sync.

Does that all make sense?

Yes it does. I see two alternatives:

  • Keep the parser I wrote for the decimal case. It is only called when the number could not be parsed as-is, so those situations can be considered unlikely. Handle other bases numbers differently.
  • Write a parser accomodating all cases and return the parsed value as well as the rest of the string if it cannot be parsed.

The second one makes more sense to me, I'll see what I can do and set this PR as draft in the meantime.

Thanks for your feedback.

samueltardieu avatar Jan 08 '24 09:01 samueltardieu

TIL I learned that coreutils inherited the "octal prefix mistake" from C.

Haha yeah and now we're stuck with it 😄

Write a parser accomodating all cases and return the parsed value as well as the rest of the string if it cannot be parsed.

That sounds great! Thank you!

tertsdiepraam avatar Jan 08 '24 09:01 tertsdiepraam

I have implemented a new number parser which can also cope with hexadecimal floats. I have put it in another module as to differentiate between the part which emits error messages and the one which only does some processing.

samueltardieu avatar Jan 08 '24 14:01 samueltardieu

As far as I know, the only remaining issue before we can activate the printf fuzzer (in matching stderr mode) is that double quotes are escaped when printed by uutils while they are not when printed by GNU coreutils. And also \c (suppress further output) doesn't seem to be honored properly.

samueltardieu avatar Jan 08 '24 14:01 samueltardieu

The latest push just adds a new test for the error message corresponding to a larger single character:

 $ target/debug/coreutils printf %d "'abc"
printf: warning: bc: character(s) following character constant have been ignored
97%

samueltardieu avatar Jan 08 '24 23:01 samueltardieu

GNU testsuite comparison:

Skip an intermittent issue tests/tail/inotify-dir-recreate (fails in this run but passes in the 'main' branch)

github-actions[bot] avatar Jan 08 '24 23:01 github-actions[bot]

The failing chown test seems unrelated to those changes and fails for me the same way locally with and without this patch.

samueltardieu avatar Jan 09 '24 08:01 samueltardieu

Btw, are different parameters used for configuring Clippy? The windows running requires that I had a #[allow(clippy::cognitive_complexity)] to the parse function, while I cannot reproduce this on Linux.

samueltardieu avatar Jan 09 '24 11:01 samueltardieu

GNU testsuite comparison:

Skip an intermittent issue tests/tail/inotify-dir-recreate (fails in this run but passes in the 'main' branch)

github-actions[bot] avatar Jan 09 '24 11:01 github-actions[bot]

The failing macos/x86_64 check seems unrelated.

samueltardieu avatar Jan 09 '24 14:01 samueltardieu

GNU testsuite comparison:

Skipping an intermittent issue tests/tail/inotify-dir-recreate (passes in this run but fails in the 'main' branch)

github-actions[bot] avatar Jan 10 '24 14:01 github-actions[bot]

Great! Thank you!

tertsdiepraam avatar Jan 10 '24 15:01 tertsdiepraam