frameless icon indicating copy to clipboard operation
frameless copied to clipboard

Unexpected behavior of Encoding of nested Options

Open imarios opened this issue 7 years ago • 8 comments

> val f: Option[Option[Option[Int]]] = None
> val d = TypedDataset.create( f :: Nil )
> println( d.collect().run() )
WrappedArray(Some(Some(None)))

expected is WrappedArray(None)

imarios avatar Feb 05 '17 06:02 imarios

I'm not sure there is a way to fix it. Internally Spark represents Option[Option[Int]] as a single nullable Integer, so there is no difference in encoding Some(None) and None. To fix it we have to use to customize encoding for such cases, but this would break compatibility with the rest of Spark code, so the best we can is to break compilation for cases we don't support properly.

kanterov avatar Feb 05 '17 22:02 kanterov

Spark has exactly the same behaviour:

scala> val xs: List[Option[Option[Int]]] = List(Some(None), None)
xs: List[Option[Option[Int]]] = List(Some(None), None)

scala> spark.createDataset(xs).collect()
res3: Array[Option[Option[Int]]] = Array(Some(None), Some(None))

kanterov avatar Feb 05 '17 22:02 kanterov

Another example of things is X1[Option[X1[Option[Int]]], it doesn't work because in Spark struct itself can't be nullable, only it's fields, that's why in this case we can't find the difference between X1#a is null or X1#a#a is null.

kanterov avatar Feb 05 '17 22:02 kanterov

very thorough explanation! I am not sure there is anything we can do. I am not even sure if there is any difference is semantics. For example, None or or Some(Some(None)) has pretty much the same semantic for me.

imarios avatar Feb 06 '17 04:02 imarios

It's actually possible have an unboxed option where None and Some(None) can be differentiated. scala-unboxed-option uses the following trick: values are stored as themselves (thus the unboxed), and they is one "None value" for each level of nesting:

object None extends Option[Nothing]
object SomeNone extends Option[Option[Nothing]]
object SomeSomeNone extends Option[Option[Option[Nothing]]]
// ...

But that would be something to change in Spark not in Frameless :)

OlivierBlanvillain avatar Feb 06 '17 09:02 OlivierBlanvillain

@kanterov what about things like TypedDataset[Tuple1[Vector[Option[Vector[Option[A]]]]]]? I'm actually using this ...

tscholak avatar Jul 19 '17 19:07 tscholak

@tscholak hm... good question, probably the best is to add to the test suite, AFAIK should work if A is primitive

kanterov avatar Jul 19 '17 19:07 kanterov

In my case, A is primitive, yes. I didn't have any problems with it so far, but just to be sure that it doesn't break any laws, I can add it to the tests.

tscholak avatar Jul 19 '17 19:07 tscholak