coreutils
coreutils copied to clipboard
csplit: rewrite with one function per pattern type
This pull request fixes several bugs in csplit by rewriting the code to have one function per pattern type. The functions are
up_to_line_repeatfor patterns like123 {45}up_to_line_alwaysfor123 {*}up_to_match_pos_offset_repeatfor/regex/+123 {45}up_to_match_pos_offset_alwaysfor/regex/+123 {*}up_to_match_neg_offset_repeatfor/regex/-123 {45}up_to_match_neg_offset_alwaysfor/regex/-123 {*}skip_to_match_pos_offset_repeatfor%regex%+123 {45}skip_to_match_pos_offset_alwaysfor%regex%+123 {*}skip_to_match_neg_offset_repeatfor%regex%-123 {45}skip_to_match_neg_offset_alwaysfor%regex%-123 {*}
The command-line options, global state, and per-chunk state live in the Splitter struct (which replaces the SplitWriter and InputSplitter structs). There are a ton of ugly corner cases where different types of patterns interact. I've tried to call those out with comments.
I added some tests that were already passing, but were missing test coverage, including
- same regex pattern twice in a row, as in
/foo/ /foo/ - line number out of range on empty input, as in
: | csplit - 1, - skip to match with negative offset repeating indefinitely, as in
csplit %foo%-5 {*}.
Some bugs fixed include:
- missing a regex match on the final line of a line number pattern, as in
seq 20 | csplit - 10 /10/, - missing output
0\nwhen input is a directory, - #7286, a missing empty file with
--suppress-matched.
Some bugs remain:
- The GNU test file
tests/csplit/csplit-suppress-matchedremains failing. I'm hoping that the change in this pull request will make it easier to reason about csplit and fix the remaining failures. - Probably related to the previous item, there is some weird behavior of GNU csplit that I could not understand. This was already partially covered by a test named
repeat_everything. I've extracted just the relevant part of that into its own test namedtest_repeat_line_after_match_suppress_matchedthat has TODO comments for the behavior that differs from GNU csplit.
Fixes #7286
GNU testsuite comparison:
GNU test failed: tests/misc/tee. tests/misc/tee is passing on 'main'. Maybe you have to rebase?
Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)
GNU testsuite comparison:
GNU test failed: tests/misc/tee. tests/misc/tee is passing on 'main'. Maybe you have to rebase?
Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)
It is still in draft mode :)
GNU testsuite comparison:
Skipping an intermittent issue tests/misc/tee (passes in this run but fails in the 'main' branch)
Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)
GNU testsuite comparison:
Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)
GNU testsuite comparison:
Skip an intermittent issue tests/misc/tee (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)
Sorry, I was hoping to find a way to make this easier to review, but I couldn't. I will come back to this and rebase it
it has been draft for a long time, please reopen when ready