scio
scio copied to clipboard
Kantan CSV BOM support
It would be great when Scio offers a way to configure the BOM support of Kantan. We have to deal with CSVs containing a BOM and want to save the conversion step to make it work with Kantan.
Details are provided here: https://nrinaudo.github.io/kantan.csv/bom.html
Do you see a way to make that work? Thanks!
https://github.com/spotify/scio/blob/master/scio-extra/src/main/scala/com/spotify/scio/extra/csv/CsvIO.scala#L202
Should be possible since we have a custom read DoFn
instead of the generic line delimited TextIO
. I suspect you'll have to add some implicit arguments to propagate the reader codec into the DoFn
, and have to make sure they're serializable.
Wanna give this a shot?
Thanks for your reply.. Yes, will give it a shot next week.