Support java.sql.Date and java.sql.Timestamp so they work just as in plain Spark datasets.
Given the following snippet:
import frameless._
import org.apache.spark.sql.catalyst.util.DateTimeUtils
implicit val dateAsInt: Injection[java.sql.Date, Int] = Injection(DateTimeUtils.fromJavaDate, DateTimeUtils.toJavaDate)
// create some df (typically read from an orc or parquet file)
val today = new java.sql.Date(System.currentTimeMillis)
val df = Seq((42, today)).toDF("i", "d")
// and turn it into a TypedDataset
case class P(i: Int, d: java.sql.Date)
val ds = df.as[P]
val tds = TypedDataset.create(ds)
in plain Dataset you can use:
ds.filter(ds("d") === today).show
+---+----------+
| i| d|
+---+----------+
| 42|2017-11-10|
+---+----------+
but in TypedDataset this results in an AnalysisException
tds.filter(tds('d) === today).show().run
org.apache.spark.sql.AnalysisException: cannot resolve '(`d` = FramelessLit(2017-11-10))' due to data type mismatch: differing types in '(`d` = FramelessLit(2017-11-10))' (date and int).;;
'Filter (d#82 = FramelessLit(2017-11-10))
+- Project [_1#78 AS i#81, _2#79 AS d#82]
+- LocalRelation [_1#78, _2#79]
I managed to define a TypedEncoder[java.sql.Date] as:
implicit val dateEncoder: TypedEncoder[java.sql.Date] = new TypedEncoder[java.sql.Date] {
import org.apache.spark.sql.catalyst.util.DateTimeUtils
override def nullable: Boolean = false
override def jvmRepr: DataType = ScalaReflection.dataTypeFor[java.sql.Date]
override def catalystRepr: DataType = DateType
override def toCatalyst(path: Expression): Expression =
StaticInvoke(DateTimeUtils.getClass, catalystRepr, "fromJavaDate", path :: Nil, propagateNull = true)
override def fromCatalyst(path: Expression): Expression =
StaticInvoke(DateTimeUtils.getClass, jvmRepr, "toJavaDate", path :: Nil, propagateNull = true)
}
together with adding to CatalystOrdered:
implicit val framelessDateOrdered : CatalystOrdered[java.sql.Date] = of[java.sql.Date]
I can now do:
tds.filter(tds('d) === today).show().run
tds.filter(tds('d) > today).show().run
I should look in doing a PR...
@ireactsam Interestingly, this only fails on the REPL. If you run it outside the REPL (say in a unit test), everything works.
New to frameless (and shapeless and so on). Why do we need to write out the Injection in general? Spark has a built-in implicit date encoder (org.apache.spark.sql.Encoders.DATE).
@tejasmanohar Did you see this page? One typical use case would be that your entire application is based around joda-time, so you naturally also want to manipulate them in your Spark tables.