frameless icon indicating copy to clipboard operation
frameless copied to clipboard

Support java.sql.Date and java.sql.Timestamp so they work just as in plain Spark datasets.

Open ireactsam opened this issue 8 years ago • 4 comments

Given the following snippet:

    import frameless._
    import org.apache.spark.sql.catalyst.util.DateTimeUtils

    implicit val dateAsInt: Injection[java.sql.Date, Int] = Injection(DateTimeUtils.fromJavaDate, DateTimeUtils.toJavaDate)

    // create some df (typically read from an orc or parquet file)
    val today = new java.sql.Date(System.currentTimeMillis)
    val df = Seq((42, today)).toDF("i", "d")

    // and turn it into a TypedDataset
    case class P(i: Int, d: java.sql.Date)
    val ds = df.as[P]
    val tds = TypedDataset.create(ds)

in plain Dataset you can use:

ds.filter(ds("d") === today).show 

+---+----------+
|  i|         d|
+---+----------+
| 42|2017-11-10|
+---+----------+

but in TypedDataset this results in an AnalysisException

tds.filter(tds('d) === today).show().run
org.apache.spark.sql.AnalysisException: cannot resolve '(`d` = FramelessLit(2017-11-10))' due to data type mismatch: differing types in '(`d` = FramelessLit(2017-11-10))' (date and int).;;
'Filter (d#82 = FramelessLit(2017-11-10))
+- Project [_1#78 AS i#81, _2#79 AS d#82]
   +- LocalRelation [_1#78, _2#79]

ireactsam avatar Nov 10 '17 13:11 ireactsam

I managed to define a TypedEncoder[java.sql.Date] as:

  implicit val dateEncoder: TypedEncoder[java.sql.Date] = new TypedEncoder[java.sql.Date] {
    import org.apache.spark.sql.catalyst.util.DateTimeUtils
    override def nullable: Boolean = false
    override def jvmRepr: DataType = ScalaReflection.dataTypeFor[java.sql.Date]
    override def catalystRepr: DataType = DateType
    override def toCatalyst(path: Expression): Expression =
      StaticInvoke(DateTimeUtils.getClass, catalystRepr, "fromJavaDate", path :: Nil, propagateNull = true)
    override def fromCatalyst(path: Expression): Expression =
      StaticInvoke(DateTimeUtils.getClass, jvmRepr, "toJavaDate", path :: Nil, propagateNull = true)
  }

together with adding to CatalystOrdered:

implicit val framelessDateOrdered     : CatalystOrdered[java.sql.Date]      = of[java.sql.Date]

I can now do:

tds.filter(tds('d) === today).show().run
tds.filter(tds('d) > today).show().run

I should look in doing a PR...

ireactsam avatar Nov 12 '17 17:11 ireactsam

@ireactsam Interestingly, this only fails on the REPL. If you run it outside the REPL (say in a unit test), everything works.

imarios avatar Feb 01 '18 05:02 imarios

New to frameless (and shapeless and so on). Why do we need to write out the Injection in general? Spark has a built-in implicit date encoder (org.apache.spark.sql.Encoders.DATE).

tejasmanohar avatar Jun 05 '18 19:06 tejasmanohar

@tejasmanohar Did you see this page? One typical use case would be that your entire application is based around joda-time, so you naturally also want to manipulate them in your Spark tables.

OlivierBlanvillain avatar Jun 07 '18 06:06 OlivierBlanvillain