dataframe icon indicating copy to clipboard operation
dataframe copied to clipboard

Exception in documentation example of `toDataFrame`

Open sorokod opened this issue 4 months ago • 6 comments

In Operations/Create/DataFrame we have:

val df = students.toDataFrame {
    // add column
    "year of birth" from { 2021 - it.age }

    // scan all properties
    properties(maxDepth = 1) {
        exclude(Score::subject) // `subject` property will be skipped from object graph traversal
        preserve<Name>() // `Name` objects will be stored as-is without transformation into DataFrame
    }

    // add column group
    "summary" {
        "max score" from { it.scores.maxOf { it.value } }
        "min score" from { it.scores.minOf { it.value } }
    }
}

Executing this in a Kotlin Notebook cell results in: Exception while analyzing expression in (13,28) in Line_123.jupyter.kts

Line 13 refers to "max score" from { it.scores.maxOf { it.value } }

Gradle settings:

plugins {
    kotlin("jvm") version "2.2.20-Beta1"
    kotlin("plugin.dataframe") version "2.2.20-Beta1"
}

dependencies {
    implementation("org.jetbrains.kotlinx:dataframe:1.0.0-Beta2")
    testImplementation(kotlin("test"))
}

sorokod avatar Aug 21 '25 08:08 sorokod

Hi!

I suspect this is due to this issue: https://github.com/Kotlin/dataframe/issues/1116. Notebooks have issues with statistics and explicitly-not-nullable types. There's a variant of 1.0.0-Beta2 that forces statistics to only be callable on non-null columns: 1.0.0-dev-7089. Could you try if that works?

The relevant issue can be tracked here: https://youtrack.jetbrains.com/issue/KT-76441/IllegalStateException-null-DefinitelyNotNullType-for-T-exception-while-analyzing-expression

Jolanrensen avatar Aug 21 '25 10:08 Jolanrensen

With implementation("org.jetbrains.kotlinx:dataframe:1.0.0-dev-7089") there is no error and the output of students.toDataFrame {..}.print() is:

   year of birth                                   name age              scores                      summary
 0          2006 Name(firstName=Alice, lastName=Cooper)  15             [2 x 1] { max score:4, min score:3 }
 1          2001   Name(firstName=Bob, lastName=Marley)  20 [1 x 1] { value:5 } { max score:5, min score:5 }

With notebooks being cached and multiple ways to specify library dependencies it is not always clear which version of DF being executed. Is there a way to determine the version of DF similar to LetsPlot.getInfo()?

sorokod avatar Aug 21 '25 10:08 sorokod

actually yes, you can call dataFrameConfig.version :)

Jolanrensen avatar Aug 21 '25 12:08 Jolanrensen

@Jolanrensen looks like we need to release Beta-3 and Beta-3-for-Notebooks

zaleslaw avatar Aug 22 '25 08:08 zaleslaw

@Jolanrensen I've heard that you tested that it's fixed since new IDEA version, could you please confirm and close the issue if it's true

zaleslaw avatar Dec 05 '25 13:12 zaleslaw

Yes and no.

It's still a case of https://github.com/Kotlin/dataframe/issues/1116 which will be fixed when K2 becomes the default backend for notebooks. This can be tested at the moment by enabling the registry flag kotlin.notebook.replCompilerMode.enabled and setting the kernel version to something like 0.17.0-754 in K2 mode. However, at the time of writing, even the nightly version of IntelliJ does not yet support this combination of settings fully yet.

So, if you plan to attach a notebook to your module, make sure your module uses the -n version of dataframe, like 1.0.0-Beta4n.

Jolanrensen avatar Dec 05 '25 13:12 Jolanrensen