cksum: Accept non UTF-8 inputs
Fixes #6573
GNU testsuite comparison:
Skipping an intermittent issue tests/tail/inotify-dir-recreate (passes in this run but fails in the 'main' branch)
@BenWiederhake Could you take a look ? :)
@sylvestre all your comments are addressed in #6654. If that's still an issue, please tell me, I will refactor the code accordingly.
The other PR doesn't address sylvestre's points either (e.g. the linux-only issue)
The test is linux-only because os_str_as_bytes will automatically fail if given non-UTF-8 characters on windows.
This is due to the fact that there is no safe provided way to convert a OsStr to &[u8] on windows, and we are required to go through an intermediary &str conversion.
Improving the handling of windows filesystem exceptions this might be a further enhancement, but I don't think this should be the main point of this PR.
and seems to be a collection of many different PRs, which makes it unnecessarily difficult to review,
True, #6654 is a really big refactor, which solves many problems at once. I think about re-organizing all the changes in smaller PRs, while trying to keep the reasoning clear.
So which PR would you like us to review?
Let's review and merge this one first, I eventually added the changes requested which I originally put in #6654. Next, I will work on splitting #6654 into reasonably-sized changes, and try to document things correctly
As requested, I changed the code to handle non-UTF8 filenames, and I added a test for it.
Previously, I used String::from_utf8_lossy to display the filename on the terminal when needed. I realized this was wrong, because it actually inserts a REPLACEMENT CHARACTER U+FFFD for any non-UTF-8 sequence, when we actually want to omit them completely.
At first, I tried to copy-paste the implementation of String::from_utf8_lossy and adapt it, but I figured out, because of unstabilized internals, that this could not be done without copy-pasting even more code from the stdlib.
The compromise I went with is to just remove the U+FFFD chars from the output of String::from_utf8_lossy. It might produce unwanted behavior in case the character appears in the original filename, but I consider this to be unlikely enough to wait to fix it when the involved String's API is stable (MSRV 1.79 iirc).
GNU testsuite comparison:
Skip an intermittent issue tests/tail/inotify-dir-recreate (fails in this run but passes in the 'main' branch)
GNU testsuite comparison:
Skipping an intermittent issue tests/rm/rm1 (passes in this run but fails in the 'main' branch)
@BenWiederhake If you want to take a look, this should finally be mergeable
GNU testsuite comparison:
GNU test failed: tests/timeout/timeout. tests/timeout/timeout is passing on 'main'. Maybe you have to rebase?
GNU testsuite comparison:
Skip an intermittent issue tests/tail/inotify-dir-recreate (fails in this run but passes in the 'main' branch)
GNU test failed: tests/timeout/timeout. tests/timeout/timeout is passing on 'main'. Maybe you have to rebase?
I closing this PR in favor of these 2 new ones :
- comments handling (#6782)
- non-UTF-8 handling (#6793)