Exception when calling `updateGroups { it }.first().values()` on `GroupBy`
Assume we have the following dataframe:
val df = dataFrameOf(
"name" to columnOf("Alice", "Bob", "Charlie"),
"age" to columnOf(15, 20, 25),
)
Calling
df.groupBy { age }.updateGroups { it }.first().values()
either in notebooks or outside them causes an exception. In notebooks this exception is formulated in the following way:
java.lang.IllegalStateException: Can not insert column
agebecause column with this path already exists in DataFrame
If we remove age from every group:
df.groupBy { age }.updateGroups { it.remove { age } }.first().values()
the exception does not occur.
Notebooks
In notebooks, a similar problem occurs even without using values(). That is, calling:
df.groupBy { age }.updateGroups { it }.first()
causes:
The problem is found in one of the loaded libraries: check library renderers java.lang.IllegalStateException: Can not insert column
agebecause column with this path already exists in DataFrame org.jetbrains.kotlinx.jupyter.exceptions.ReplLibraryException: The problem is found in one of the loaded libraries: check library renderers
If we add concat(), the exception does not occur:
df.groupBy { age }.updateGroups { it }.first().concat()
In notebooks, a similar problem occurs even without using
values()
Because notebooks call values() under-the-hood to be able to render a ReducedGroupBy as a DataFrame
What's even weirder is that:
df.groupBy { age }.first().values()
works fine, but
df.groupBy { age }.updateGroups { it }.first().values()
throws the exception. Somehow updateGroups { it } isn't as transparant as it appears to be
I suspect it's because GroupByImpl.updateGroups() does not copy over the keyColumnsInGroups: ColumnsSelector. .asGroupBy turns that into { none() }.
values() however relies on remainingColumnsSelector() which uses this keyColumnsInGroups value. That's why we get a different result when calling updateGroups { it }.
Now we just need to figure out if this was done intentionally or it's a bug :)