coreutils split: implement round-robin arg to --number

~Help wanted! This change was previously merged in pull request #3205. Unfortunately, it resulted in an issue with the GNU test suite; see issue #3268. The change was subsequently reverted in pull request #3269. I am re-creating this pull request hoping that someone can diagnose the issue with the GNU test case tests/split/filter.sh as described in issue #3269.~ That issue no longer seems to be present on this branch after I rebased.

Implement distributing lines of a file in a round-robin manner to a specified number of chunks. For example,

$ (seq 1 10 | split -n r/3) && head -v xa[abc]
==> xaa <==
1
4
7
10

==> xab <==
2
5
8

==> xac <==
3
6
9

Mar 20 '22 13:03 jfinkels

I rebased this branch to take another look at this issue.

The summary from the GNU test workflow is misleading. It says:

Error: GNU test failed: tests/split/filter. tests/split/filter is passing on 'main'. Maybe you have to rebase?
Error: GNU test error: tests/split/filter. tests/split/filter is passing on 'main'. Maybe you have to rebase?

but tests/split/filter.sh is not passing on the main branch, it is exiting with an error:

2022-05-05T23:38:24.9436982Z filter.sh: set-up failure: 
2022-05-05T23:38:24.9437113Z ERROR tests/split/filter.sh (exit status: 99)

(from recent build on main branch: https://github.com/uutils/coreutils/runs/6314524034?check_suite_focus=true )

I believe that this is another case where implementing a feature has caused one of the GNU tests to be able to run further than it had previously been able to run. Specifically, I believe that if we merge this, then there will be one less ERROR and one more FAIL in the GNU test suite. The issue we were previously seeing no longer seems to be present, so I think we should re-consider merging this new feature.

May 06 '22 01:05 jfinkels

If I understand correctly, we should change how we report the GNU changes? ERROR -> FAIL should probably be considered an improvement, instead of being reported as a problem.

May 06 '22 09:05 tertsdiepraam

If I understand correctly, we should change how we report the GNU changes? ERROR -> FAIL should probably be considered an improvement, instead of being reported as a problem.

Yes, I believe that is correct. But I haven't looked at the code that summarizes the GNU test results so I'm not totally positive about that.

May 07 '22 02:05 jfinkels

I rebased this pull request.

Error: GNU test failed: tests/misc/timeout. tests/misc/timeout is passing on 'main'. Maybe you have to rebase?
Error: GNU test failed: tests/split/filter. tests/split/filter is passing on 'main'. Maybe you have to rebase?
Warning: Congrats! The gnu test tests/split/filter is no longer ERROR!
Error: Process completed with exit code 255.

As I mentioned in a previous comment, the tests/split/filter.sh test is in the ERROR state on main so the "is passing on 'main'" message is incorrect.
The tests/split/filter.sh test is still producing about one million lines of shell-init: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory
For tests/misc/timeout.sh test, this could be failing because tests/split/filter.sh is spending about one minute printing those error messages? I'm not sure.

May 27 '22 01:05 jfinkels

Fails with:


error[E0061]: this function takes 5 arguments but 4 arguments were supplied
Error:     --> src/uu/split/src/split.rs:1233:33
     |
1233 |       let mut filename_iterator = FilenameIterator::new(
     |  _________________________________^^^^^^^^^^^^^^^^^^^^^-
1234 | |         &settings.prefix,
1235 | |         &settings.additional_suffix,
1236 | |         settings.suffix_length,
1237 | |         settings.suffix_type,
1238 | |     );
     | |_____- an argument of type `usize` is missing

Oct 20 '22 05:10 sylvestre

GNU testsuite comparison:

GNU test failed: tests/split/filter. tests/split/filter is passing on 'main'. Maybe you have to rebase?
Congrats! The gnu test tests/split/filter is no longer ERROR!

Oct 23 '22 04:10 github-actions[bot]

@sylvestre The tests/split/filter.sh test is still producing about one million lines of shell-init: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory. So, maybe we should skip that part of the test for the moment? I believe it is this part:

# Ensure that "endless" input _is_ processed for unbounded number of filters    
for buf in 1000 1000000; do
  returns_ 124 timeout .5 sh -c \
    "yes | split --filter='head -c1 >/dev/null' -b $buf" || fail=1
done

Oct 23 '22 14:10 jfinkels

coreutils coreutils copied to clipboard

split: implement round-robin arg to --number

coreutils
coreutils copied to clipboard