datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

Support skipping comments in `CsvReader`

Open bbannier opened this issue 1 year ago • 1 comments

It would be great if datafusion had out-of-the-box support for skipping comment lines. While non of this is "standardized" many CSV readers support skipping full comment lines. An often used comment indentifier is a # prefix (default in e.g. pandas or R).

I originally posted this as a comment https://github.com/apache/datafusion/issues/8824#issuecomment-2078835858.

bbannier avatar Apr 27 '24 07:04 bbannier

take

pingsutw avatar Apr 27 '24 13:04 pingsutw

I opened https://github.com/apache/arrow-rs/pull/5759 to add comment support to Arrow's CSV reader. With that the work here is mostly around passing that flag from user code to the actual reader, and implementing support for the flag to be serialized in protos.

bbannier avatar May 12 '24 09:05 bbannier