csvkit icon indicating copy to clipboard operation
csvkit copied to clipboard

Option to limit the number of rows output (head, tail, slice)

Open jasperboyd opened this issue 8 years ago • 8 comments

Hey I have a large CSV I'm looking to break up into smaller chunks to perform an import. Is this functionality already available with csvcut or csvgrep. I was reviewing the docs but I didn't see any avenue to do it?

If this isn't currently possible I'd be happy to contribute it to the project.

Cheers

jasperboyd avatar Oct 14 '16 19:10 jasperboyd

So, a sort of csvhead or csvtail?

jpmckinney avatar Oct 14 '16 21:10 jpmckinney

Yeah exactly, the way I was envisioning, would be either defining the amount of rows you want in each section, 1000 on a 10,000 row csv would create 10 files, with the option to include the header row in each file. However creating a csvhead and csvtail functionality would probably be a good middle step.

jasperboyd avatar Oct 14 '16 21:10 jasperboyd

We try to avoid creating files so that people can pipe the output of one tool into another tool, so I think we should start with csvhead (and/or csvtail). However, you may just want to write your own bit of code using agate on which csvkit relies.

jpmckinney avatar Oct 14 '16 22:10 jpmckinney

Sounds good to me!

jasperboyd avatar Oct 16 '16 21:10 jasperboyd

+1

superurbi avatar Nov 14 '16 13:11 superurbi

I'd seriously consider a csvsplit or csvhead or csvslice or some-such as an additional tool. Proposed interface:

csvslice -s [START_ROW] -n [NUM_ROWS] input.csv

Can hook on agate's Table.limit.

Not sure if/how it should handle the "gimme 50 files" case. Probably with a flag.

onyxfish avatar Dec 29 '16 14:12 onyxfish

Re: -s [START_ROW], that will be a common option across almost all tools: #775

jpmckinney avatar Jan 28 '17 06:01 jpmckinney

Noting that if there is demand, we can take inspiration from https://github.com/dannguyen/csvmedkit/blob/main/csvmedkit/utils/csvslice.py

jpmckinney avatar May 02 '24 17:05 jpmckinney