coreutils
coreutils copied to clipboard
ptx: special char breaks it with "thread 'main' panicked at 'assertion failed: end <= s.len()', src/uu/ptx/src/ptx.rs:291:5"
With foo.txt containing:
it’s disabled
The char ’ is key here
ptx -G foo.txt
thread 'main' panicked at 'assertion failed: end <= s.len()', src/uu/ptx/src/ptx.rs:291:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
So the root cause seems to be that read_input is producing a FileContent that's considering string as Vec
The ’ is a UTF-8 char. That's a single char but it's several byte long so this is breaking some assumptions.
I will see if I can fix that while keeping the optimizations recently introduced. I plan to work on that in the next few days.
I have an idea on how to fix this: use iterators instead of Vec<char>.
Trimming whitespaces, for example, could be done simply with skip_while.
Trimming broken words is more complicated, as we need to check for whitespace beyond the edge of the string. We need our own type that is like std::str::Chars but also keep the character beyond the beginning/end of the iterator.
For the jumping backward and trimming from the right, we can leverage std::str::Chars being DoubleEndedIterator (utf-8 can be walked backward too, I just discovered).
@sylvestre I've been making test cases. Should I add Unicode cases too or skip it until this is fixed? (adding it would cause failures even for things unrelated to ptx - I worry that might be annoying).
@wishawa are you working on your idea? You have more context than me, so if you are taking care of that I will work on something else.
@wishawa are you working on your idea? You have more context than me, so if you are taking care of that I will work on something else.
No. I'm still working on the tests (and am slow at that lol). Feel free to go ahead and work on this one😃
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
It is still happening. As I don't think @wishawa is still working on it, others can work on it!