csvtk
csvtk copied to clipboard
feature request: csvtk split by number of lines per chunk
This is a feature request for the csvtk split
command to have and additional --nlines
option so that it behaves similarly to the GNU utils split --lines
(https://www.gnu.org/software/coreutils/manual/html_node/split-invocation.html) but deals with the headers in a nice way.
E.g. we have a file with 5 entries: a,b,c,d 1,2,3,4 2,3,4,5 3,4,5,6 4,5,6,7 5,6,7,8
We run csvtk split --nlines 2, which produces chunks of 2 entries per line: ##file1 a,b,c,d 1,2,3,4 2,3,4,5 ##file2 a,b,c,d 3,4,5,6 4,5,6,7 ##file3 a,b,c,d 5,6,7,8
Thanks in advance
I need to use this feature when working with very large csv files, which I usually keep compressed with gzip or zstd (which supports significantly faster decompression speed). For the moment, I use xsv from https://github.com/BurntSushi/xsv which does exactly what has been asked above. However, it outputs uncompressed csv chunks only. I haven't figured out a way to output chunks compressed with gzip or zstd. This feature would be a very useful addition to csvtk.