scio icon indicating copy to clipboard operation
scio copied to clipboard

Kantan CSV BOM support

Open f-loris opened this issue 3 years ago • 2 comments

It would be great when Scio offers a way to configure the BOM support of Kantan. We have to deal with CSVs containing a BOM and want to save the conversion step to make it work with Kantan.

Details are provided here: https://nrinaudo.github.io/kantan.csv/bom.html

Do you see a way to make that work? Thanks!

f-loris avatar Jan 28 '21 10:01 f-loris

https://github.com/spotify/scio/blob/master/scio-extra/src/main/scala/com/spotify/scio/extra/csv/CsvIO.scala#L202

Should be possible since we have a custom read DoFn instead of the generic line delimited TextIO. I suspect you'll have to add some implicit arguments to propagate the reader codec into the DoFn, and have to make sure they're serializable.

Wanna give this a shot?

nevillelyh avatar Feb 02 '21 21:02 nevillelyh

Thanks for your reply.. Yes, will give it a shot next week.

f-loris avatar Feb 05 '21 08:02 f-loris