node-csv icon indicating copy to clipboard operation
node-csv copied to clipboard

Support N-x configuration for To option to skip trailer rows

Open bluejack opened this issue 3 years ago • 8 comments

Summary

In csv-parse, the 'to' and 'to_line' options do not adequately support stripping trailing records. Recommend adding functionality that would either allow a syntax to these options, eg: 'T.1' to indicate stopping at end-1.

Motivation

We process some files that include a trailing "summary" record for the file. These are difficult to deal with.

Alternative

We could pre-process the file to strip these trailing rows, but it can be tricky if they contain fields with embedded newlines in field.

Draft

This can be easily implemented by dropping each parsed row into a holding queue; by pushing the configured "T-minus" row along the pipe rather than the current one, and simply dropping the queue at EOF.

Additional context

Add any other context or screenshots about the feature request here.

bluejack avatar Aug 05 '22 20:08 bluejack

Could you provide a sample of what you expect, that would help the understanding.

wdavidw avatar Aug 06 '22 19:08 wdavidw

sure. eg csv:

"Data","Data","Data","More Data","More and more data","Final data field" "Rows:","1"

Instead of processing that trailer row, I want to skip it. So, I would want to configure the parser:

skip_trailing: 1

Does that make more sense?

bluejack avatar Aug 08 '22 18:08 bluejack

Not really, do you want this:

const data = "a,b,c\nRows:1"'
const recors = parse(data, {skip_trailing: 1})
records.should.eql(["a","b","c"])

wdavidw avatar Aug 08 '22 18:08 wdavidw

Yes! Thank you for reading my mind, since apparently my communication skills are weak!

bluejack avatar Aug 09 '22 17:08 bluejack

If you think this is a viable idea, and you would like me to take a stab at it, I'm happy to submit a pull request later this week. If you don't think it's a good idea, or would rather do it yourself, I won't dive in.

bluejack avatar Aug 09 '22 21:08 bluejack

Hum, I don't think that possible. The way csv-parse is created is to handle an unlimited number of records. This use-case involve knowing in advance how many records there are to skip the last n-records. This isn't scalable.

wdavidw avatar Aug 10 '22 23:08 wdavidw

I'm already doing it outside your parser by buffering the N trailing rows to drop in a rotating queue and then dropping the queue on end-of-stream; i could implement that in the parser if you wanted the functionality.

bluejack avatar Aug 10 '22 23:08 bluejack

Currious to see your code. Not sure that I want to make the parser more complex but let me look first at how you are doing this.

wdavidw avatar Aug 11 '22 12:08 wdavidw