dataframe icon indicating copy to clipboard operation
dataframe copied to clipboard

Union type columns

Open Jolanrensen opened this issue 2 years ago • 3 comments

There are cases, for instance when reading from JSON or reading from EXCEL, or converting iterables to DataFrames containing exceptions where columns containing multiple types will be created.

Currently, DataFrame tries to find the lowest common ancestor of the two types, just like Kotlin's type system, which in many cases (such as for Double+String) results in Serializable or Comparable<*> (or Any). In such cases, it might be beneficial to have a simple union-type wrapper so people can more easily handle both types individually or convert one of the two types to the other manually.

We could introduce a 2-type wrapper (or 3-type, n-type, but I'd rather not) with a set of extension functions (on ValueColumn<UnionType<A, B>>) and proper type-displaying to make it easier for our users. Potentially we can also investigate how other libraries handle these cases.

Jolanrensen avatar Oct 09 '23 14:10 Jolanrensen

If it's possbile, could you please attach a copy of xlsx/json file for that case?

zaleslaw avatar Oct 09 '23 15:10 zaleslaw

Don't think it's necessary to have a separate file. It's pretty trivial to get a sample:

[
    { "a": 1 },
    { "a": "hi" }
]

similar for xlsx.

Jolanrensen avatar Oct 09 '23 15:10 Jolanrensen

Another representation of UnionType could be a ColumnGroup with nullable column for each type. But my concern that it will confuse a person who wants to do something with this column right after read, if it's done implicitly One needs to see a schema or a resulting table to understand what happened.

koperagen avatar Oct 09 '23 16:10 koperagen