frameless
frameless copied to clipboard
Unexpected behavior of Encoding of nested Options
> val f: Option[Option[Option[Int]]] = None
> val d = TypedDataset.create( f :: Nil )
> println( d.collect().run() )
WrappedArray(Some(Some(None)))
expected is WrappedArray(None)
I'm not sure there is a way to fix it. Internally Spark represents Option[Option[Int]]
as a single nullable Integer
, so there is no difference in encoding Some(None)
and None
. To fix it we have to use to customize encoding for such cases, but this would break compatibility with the rest of Spark code, so the best we can is to break compilation for cases we don't support properly.
Spark has exactly the same behaviour:
scala> val xs: List[Option[Option[Int]]] = List(Some(None), None)
xs: List[Option[Option[Int]]] = List(Some(None), None)
scala> spark.createDataset(xs).collect()
res3: Array[Option[Option[Int]]] = Array(Some(None), Some(None))
Another example of things is X1[Option[X1[Option[Int]]]
, it doesn't work because in Spark struct itself can't be nullable, only it's fields, that's why in this case we can't find the difference between X1#a
is null or X1#a#a
is null.
very thorough explanation! I am not sure there is anything we can do. I am not even sure if there is any difference is semantics. For example, None
or or Some(Some(None))
has pretty much the same semantic for me.
It's actually possible have an unboxed option where None
and Some(None)
can be differentiated. scala-unboxed-option uses the following trick: values are stored as themselves (thus the unboxed), and they is one "None
value" for each level of nesting:
object None extends Option[Nothing]
object SomeNone extends Option[Option[Nothing]]
object SomeSomeNone extends Option[Option[Option[Nothing]]]
// ...
But that would be something to change in Spark not in Frameless :)
@kanterov what about things like TypedDataset[Tuple1[Vector[Option[Vector[Option[A]]]]]]
? I'm actually using this ...
@tscholak hm... good question, probably the best is to add to the test suite, AFAIK should work if A
is primitive
In my case, A
is primitive, yes. I didn't have any problems with it so far, but just to be sure that it doesn't break any laws, I can add it to the tests.