csvkit
csvkit copied to clipboard
Option to limit the number of rows output (head, tail, slice)
Hey I have a large CSV I'm looking to break up into smaller chunks to perform an import. Is this functionality already available with csvcut or csvgrep. I was reviewing the docs but I didn't see any avenue to do it?
If this isn't currently possible I'd be happy to contribute it to the project.
Cheers
So, a sort of csvhead
or csvtail
?
Yeah exactly, the way I was envisioning, would be either defining the amount of rows you want in each section, 1000 on a 10,000 row csv would create 10 files, with the option to include the header row in each file. However creating a csvhead
and csvtail
functionality would probably be a good middle step.
We try to avoid creating files so that people can pipe the output of one tool into another tool, so I think we should start with csvhead
(and/or csvtail
). However, you may just want to write your own bit of code using agate on which csvkit relies.
Sounds good to me!
+1
I'd seriously consider a csvsplit
or csvhead
or csvslice
or some-such as an additional tool. Proposed interface:
csvslice -s [START_ROW] -n [NUM_ROWS] input.csv
Can hook on agate's Table.limit
.
Not sure if/how it should handle the "gimme 50 files" case. Probably with a flag.
Re: -s [START_ROW]
, that will be a common option across almost all tools: #775
Noting that if there is demand, we can take inspiration from https://github.com/dannguyen/csvmedkit/blob/main/csvmedkit/utils/csvslice.py