dataframe icon indicating copy to clipboard operation
dataframe copied to clipboard

Exception when calling `updateGroups { it }.first().values()` on `GroupBy`

Open Allex-Nik opened this issue 1 month ago • 3 comments

Assume we have the following dataframe:

val df = dataFrameOf(
    "name" to columnOf("Alice", "Bob", "Charlie"),
    "age" to columnOf(15, 20, 25),
)

Calling

df.groupBy { age }.updateGroups { it }.first().values()

either in notebooks or outside them causes an exception. In notebooks this exception is formulated in the following way:

java.lang.IllegalStateException: Can not insert column age because column with this path already exists in DataFrame

If we remove age from every group:

df.groupBy { age }.updateGroups { it.remove { age } }.first().values()

the exception does not occur.

Notebooks

In notebooks, a similar problem occurs even without using values(). That is, calling:

df.groupBy { age }.updateGroups { it }.first()

causes:

The problem is found in one of the loaded libraries: check library renderers java.lang.IllegalStateException: Can not insert column age because column with this path already exists in DataFrame org.jetbrains.kotlinx.jupyter.exceptions.ReplLibraryException: The problem is found in one of the loaded libraries: check library renderers

If we add concat(), the exception does not occur:

df.groupBy { age }.updateGroups { it }.first().concat()

Allex-Nik avatar Nov 13 '25 13:11 Allex-Nik

In notebooks, a similar problem occurs even without using values()

Because notebooks call values() under-the-hood to be able to render a ReducedGroupBy as a DataFrame

Jolanrensen avatar Nov 13 '25 13:11 Jolanrensen

What's even weirder is that:

df.groupBy { age }.first().values()

works fine, but

df.groupBy { age }.updateGroups { it }.first().values()

throws the exception. Somehow updateGroups { it } isn't as transparant as it appears to be

Jolanrensen avatar Nov 13 '25 13:11 Jolanrensen

I suspect it's because GroupByImpl.updateGroups() does not copy over the keyColumnsInGroups: ColumnsSelector. .asGroupBy turns that into { none() }.

values() however relies on remainingColumnsSelector() which uses this keyColumnsInGroups value. That's why we get a different result when calling updateGroups { it }.

Now we just need to figure out if this was done intentionally or it's a bug :)

Jolanrensen avatar Nov 13 '25 14:11 Jolanrensen