dataframe icon indicating copy to clipboard operation
dataframe copied to clipboard

AddDsl with restricted type

Open Jolanrensen opened this issue 2 years ago • 9 comments

Currently, something like this is possible:

val df = dataFrameOf("X" to listOf("a", "b", "c"), "Y" to listOf(1, 2, 3))
val l by column<String>()
df.add(l) { Y }

Essentially adding a column of integers by referring to a column of strings. If a user references l again in this cell, there would be a typing error. If add is called with a typed column accessor (or KProperty), the Dsl should be limited to only accept new columns of the correct type.

Jolanrensen avatar Apr 22 '23 10:04 Jolanrensen

@koperagen please find an old commit from @nikitinas regarding this issue, it looks like it was implemented but reverted

zaleslaw avatar Apr 25 '23 15:04 zaleslaw

Actually, I think this is simply a result of Kotlin's type inference. This is the function in question:

public inline fun <reified R, T> DataFrame<T>.add(
    column: ColumnAccessor<R>,
    infer: Infer = Infer.Nulls,
    noinline expression: AddExpression<T, R>
): DataFrame<T> =
    add(column.path(), infer, expression)

R is automatically inferred from the context and since we provide typing information for R both in the column argument and the AddExpression, R is inferred to be Int and String, thus Any. This can also be seen in the result:

⌌--------------------------⌍
|  | x:String| y:Int| l:Any|
|--|---------|------|------|
| 0|        a|     1|     1|
| 1|        b|     2|     2|
| 2|        c|     3|     3|
⌎--------------------------⌏

One way to solve this is by splitting the function into:

public fun interface AddOperation<R, T> {
    public operator fun invoke(expression: AddExpression<T, R>): DataFrame<T>
}

public inline fun <reified R, T> DataFrame<T>.add(
    column: ColumnAccessor<R>,
    infer: Infer = Infer.Nulls,
): AddOperation<R, T> = AddOperation { expression ->
    add(column.path(), infer, expression)
}

That makes R only defined by column. However, thanks to how the kotlin compiler works, you can now only write the function like this:

df.add(l)() {
    y
}

or

(df.add(l)) {
    y
}

It will correctly show you've got the wrong type, but you'd have to write double brackets every time...

Jolanrensen avatar Jun 09 '23 11:06 Jolanrensen

https://youtrack.jetbrains.com/issue/KT-59207/Trailing-lambda-operator-invoke-resolution

Jolanrensen avatar Jun 09 '23 12:06 Jolanrensen

Random Sunday night thought that might just work: make expression of type AddExpression<T, S> and make S : R. Then R is defined by column and S must follow R.

Edit: Unfortunately, R is still inferred Any in this case :(

Jolanrensen avatar Jun 11 '23 22:06 Jolanrensen

First it was fixed here: 1d4e0e8ca1f1037396f1a70a041e7960ee76e8f9 But then reverted here without explanations bafff849a4771293a2c5d52b6e36219a564d12b4

koperagen avatar Jun 12 '23 11:06 koperagen

Seems like a lot of different functions break if I revert it back. Edit: oh, only like 3

But it was removed for a reason. My understanding of out/in (as is the case for many) is not optimal, so I'll need to investigate what the implications are for having a ColumnAccessor<T> v/s a ColumnAccessor<out T>

Jolanrensen avatar Jun 12 '23 12:06 Jolanrensen

Unfortunately the same issue applies for the KProperties API:

public inline fun <reified R, T> DataFrame<T>.add(
    property: KProperty<R>,
    infer: Infer = Infer.Nulls,
    noinline expression: AddExpression<T, R>
): DataFrame<T> =
    (this + mapToColumn(property, infer, expression))

KProperty has the type out V, making the function work in the same faulty way:

@DataSchema
data class Row(val x: Int, val y: String)

df.add(Row::y) { x } // type is Any
df.add(Row::x) { x } // type is Int

Jolanrensen avatar Jun 12 '23 12:06 Jolanrensen

Made a branch that restores the change again: https://github.com/Kotlin/dataframe/tree/restricted-add-dsl

Jolanrensen avatar Jun 12 '23 15:06 Jolanrensen

From: https://youtrack.jetbrains.com/issue/KT-13198

We could try:

@Suppress("INVISIBLE_MEMBER", "INVISIBLE_REFERENCE")
fun <T> KProperty0<T>.f(value: @kotlin.internal.NoInfer T)

This enforces T to come from the KProperty, not the value.

Jolanrensen avatar Jul 01 '24 07:07 Jolanrensen

This overload is deprecated

Image

koperagen avatar Oct 14 '25 12:10 koperagen