csvtk icon indicating copy to clipboard operation
csvtk copied to clipboard

feature request: csvtk split by number of lines per chunk

Open avilella opened this issue 3 years ago • 1 comments

This is a feature request for the csvtk split command to have and additional --nlines option so that it behaves similarly to the GNU utils split --lines (https://www.gnu.org/software/coreutils/manual/html_node/split-invocation.html) but deals with the headers in a nice way.

E.g. we have a file with 5 entries: a,b,c,d 1,2,3,4 2,3,4,5 3,4,5,6 4,5,6,7 5,6,7,8

We run csvtk split --nlines 2, which produces chunks of 2 entries per line: ##file1 a,b,c,d 1,2,3,4 2,3,4,5 ##file2 a,b,c,d 3,4,5,6 4,5,6,7 ##file3 a,b,c,d 5,6,7,8

Thanks in advance

avilella avatar Jan 19 '21 16:01 avilella

I need to use this feature when working with very large csv files, which I usually keep compressed with gzip or zstd (which supports significantly faster decompression speed). For the moment, I use xsv from https://github.com/BurntSushi/xsv which does exactly what has been asked above. However, it outputs uncompressed csv chunks only. I haven't figured out a way to output chunks compressed with gzip or zstd. This feature would be a very useful addition to csvtk.

san-r avatar Jul 25 '21 09:07 san-r