kantan.csv Optional columns

This PR adds the ability to specify "optional" columns (at a slight performance cost due to additional allocations). The approach here was to basically build in an Apply, by describing both an product method and map method for the HeaderDecoder type. This allows HeaderDecoders to be composed together to form larger decoders (I think I could potentially also define a pure method for the HeaderDecoder which would describe a column that just has a default value, in order to get a full Applicative but I'm not sure if there is any utility to that).

The tests include some examples of what can be done with this approach.

Closes #215

Feb 14 '22 15:02 paulpdaniels

Before I get too deep into understanding what this does and how, can you explain how it improves on the existing support?

import kantan.csv._
import kantan.csv.ops._

// For readability purposes
type Row = (Int, Option[Boolean])
object Row {
  def apply(i: Int, o: Option[Boolean]): Row = (i, o)
}

implicit val decoder: HeaderDecoder[Row] = HeaderDecoder.decoder("i", "o")(Row.apply _)

val csv =
  """i,o
     |1,
     |2,true""".stripMargin

csv.asCsvReader[Row](rfc.withHeader).toList

I'm not necessarily against the idea of allowing header decoders to combine, but I'm not entirely sure I understand why it's correlated with supporting optional cells.

Feb 14 '22 20:02 nrinaudo

On second read, this sounded combative but I really didn’t mean it to. I’m working under the assumption that your proposed modifications add support for optional cells in a different way than already present, but that I’m failing to see it.

Feb 14 '22 22:02 nrinaudo

Ah, got it. What we support is optional cells, but you’re working towards optional columns - including the header. That’s what I’d missed.

Feb 14 '22 22:02 nrinaudo

but you’re working towards optional columns - including the header

Exactly, yeah I should have included some examples in the proposal, since the linked topic discussed both (optional column vs optional cell). My motivating issue is that we have an upload endpoint that allows someone to push a CSV of items to enrich. By default we require them to import three columns, but our enrichment can work with just one or two (not as well but 🤷 ). We currently just say they have to submit all three columns even if they leave one or more blank i.e.

Name,Email,Company
,[email protected],

But we'd like users to be able to just drop the column altogether if they don't have it.

Email
[email protected]

this sounded combative but I really didn’t mean it to.

Also no worries, I didn't read it as combative at all, it seemed perfectly reasonable to ask for a rationale for this PR. 👍

Feb 15 '22 09:02 paulpdaniels

Hey! Are you planning on merging this at some point? This feature would be crucial for my needs but I won't fork yet if this is going to be merged. Thanks! 🙂

Sep 06 '22 12:09 htunk

This should hit maven central in a few hours. Sorry, I completely forgot about that PR!

Sep 06 '22 14:09 nrinaudo

kantan.csv kantan.csv copied to clipboard

Optional columns

kantan.csv
kantan.csv copied to clipboard