kantan.csv icon indicating copy to clipboard operation
kantan.csv copied to clipboard

Case insensitive headers

Open paulpdaniels opened this issue 3 years ago • 2 comments

We have a requirement that sees us needing to support customers "creative" decisions regarding casing for headers. Seems like I can't do this out of the box, but I should be able to hack together a new version of a HeaderDecoder that does this correctly. I think this could actually be generalized within the determineRowMappings function by simply adding an equality function String => String => Boolean. Then when doing the index check simply replace val index = csvHeader.indexOf(header) with:

val index = csvHeader.indexWhere(fn(header))

Could go even further potentially and define an equality typeclass that can be derived from an Ordering (i.e. a <> b).

I'd like to submit a PR to this effect, but wanted to check on the preferred approach (assuming you'd accept the change). I can either update the existing generated decoders to support another parameter def decoder[A1: CellDecoder, R](f1: String)(f: (A1) => R): HeaderDecoder[R] becomes def decoder[A1: CellDecoder, R](f1: String, eq: (String, String) => Boolean = defaultEquality)(f: (A1) => R): HeaderDecoder[R], or I can create another set of generated methods that are called def decoderWith, and using the mixin method GeneratedHeaderDecoders1 to avoid bin-compat issues.

paulpdaniels avatar Feb 08 '22 07:02 paulpdaniels

That sounds great!

I think the Ordering type class would be overkill:

  • painful if it's "just" to specify a different way of comparing strings
  • requires a fair amount of rewiring to support any type as header label

Not that it wouldn't be an interesting feature to have, but it's only vaguely correlated with what you're trying to achieve - it would work roughly the same way and re-use most of the same code, but what you're trying to achieve is case-insensitivity, not arbitrary types. I wouldn't mind such a PR, but I'd argue that we should focus on solving your actual problem first.

A potential second PR would almost be trivial, once the groundwork is laid: replace String with A, replace the eq function with A => A => Boolean, write a new Eq type class (for which default implementations could be provided from types that have an instance of Ordering), and have new generated methods that rely on it to provide an eq function to the "core" implementations.

I don't particularly try to maintain bin compat, and will always gleefully break it if supporting it would mean writing "worse" code than I otherwise would. What I'll insist on is source compat though - what compiled before must compile now, unless a very strong argument is made.

nrinaudo avatar Feb 08 '22 08:02 nrinaudo

Ok so for reference this is what I ended up with for our case:

  private def determineRowMappings(requiredHeader: Seq[String], csvHeader: Seq[String])(
    eq: (String, String) => Boolean
  ): DecodeResult[Seq[Int]] = {
    @tailrec
    def loop(missing: List[String], found: List[Int], required: List[String]): DecodeResult[Seq[Int]] = required match {
      case head :: rest =>
        val index = csvHeader.indexWhere(eq(head, _))
        if (index < 0) loop(head :: missing, found, rest)
        else loop(missing, index :: found, rest)
      case Nil if missing.nonEmpty =>
        DecodeResult.typeError(s"Missing header(s): ${missing.reverse.mkString(", ")}")
      case Nil =>
        DecodeResult.success(found.reverse)
    }

    loop(List.empty, List.empty, requiredHeader.toList)
  }

  implicit class EnrichedHeaderDecoder(val value: HeaderDecoder.type) extends AnyVal {

    def decoderWith[A1: CellDecoder, A2: CellDecoder, A3: CellDecoder, R](f1: String, f2: String, f3: String)(
      equal: (String, String) => Boolean
    )(
      f: (A1, A2, A3) => R
    ): HeaderDecoder[R] = new HeaderDecoder[R] {

      def fromHeader(header: Seq[String]): DecodeResult[RowDecoder[R]] =
        determineRowMappings(List(f1, f2, f3), header)(equal).map(mapping =>
          RowDecoder.decoder(mapping(0), mapping(1), mapping(2))(f)
        )

      def noHeader: RowDecoder[R] = RowDecoder.ordered(f)
    }

  }

paulpdaniels avatar Feb 09 '22 03:02 paulpdaniels