AddDsl with restricted type
Currently, something like this is possible:
val df = dataFrameOf("X" to listOf("a", "b", "c"), "Y" to listOf(1, 2, 3))
val l by column<String>()
df.add(l) { Y }
Essentially adding a column of integers by referring to a column of strings. If a user references l again in this cell, there would be a typing error.
If add is called with a typed column accessor (or KProperty), the Dsl should be limited to only accept new columns of the correct type.
@koperagen please find an old commit from @nikitinas regarding this issue, it looks like it was implemented but reverted
Actually, I think this is simply a result of Kotlin's type inference. This is the function in question:
public inline fun <reified R, T> DataFrame<T>.add(
column: ColumnAccessor<R>,
infer: Infer = Infer.Nulls,
noinline expression: AddExpression<T, R>
): DataFrame<T> =
add(column.path(), infer, expression)
R is automatically inferred from the context and since we provide typing information for R both in the column argument and the AddExpression, R is inferred to be Int and String, thus Any. This can also be seen in the result:
⌌--------------------------⌍
| | x:String| y:Int| l:Any|
|--|---------|------|------|
| 0| a| 1| 1|
| 1| b| 2| 2|
| 2| c| 3| 3|
⌎--------------------------⌏
One way to solve this is by splitting the function into:
public fun interface AddOperation<R, T> {
public operator fun invoke(expression: AddExpression<T, R>): DataFrame<T>
}
public inline fun <reified R, T> DataFrame<T>.add(
column: ColumnAccessor<R>,
infer: Infer = Infer.Nulls,
): AddOperation<R, T> = AddOperation { expression ->
add(column.path(), infer, expression)
}
That makes R only defined by column. However, thanks to how the kotlin compiler works, you can now only write the function like this:
df.add(l)() {
y
}
or
(df.add(l)) {
y
}
It will correctly show you've got the wrong type, but you'd have to write double brackets every time...
https://youtrack.jetbrains.com/issue/KT-59207/Trailing-lambda-operator-invoke-resolution
Random Sunday night thought that might just work: make expression of type AddExpression<T, S> and make S : R. Then R is defined by column and S must follow R.
Edit: Unfortunately, R is still inferred Any in this case :(
First it was fixed here: 1d4e0e8ca1f1037396f1a70a041e7960ee76e8f9 But then reverted here without explanations bafff849a4771293a2c5d52b6e36219a564d12b4
Seems like a lot of different functions break if I revert it back. Edit: oh, only like 3
But it was removed for a reason. My understanding of out/in (as is the case for many) is not optimal, so I'll need to investigate what the implications are for having a ColumnAccessor<T> v/s a ColumnAccessor<out T>
Unfortunately the same issue applies for the KProperties API:
public inline fun <reified R, T> DataFrame<T>.add(
property: KProperty<R>,
infer: Infer = Infer.Nulls,
noinline expression: AddExpression<T, R>
): DataFrame<T> =
(this + mapToColumn(property, infer, expression))
KProperty has the type out V, making the function work in the same faulty way:
@DataSchema
data class Row(val x: Int, val y: String)
df.add(Row::y) { x } // type is Any
df.add(Row::x) { x } // type is Int
Made a branch that restores the change again: https://github.com/Kotlin/dataframe/tree/restricted-add-dsl
From: https://youtrack.jetbrains.com/issue/KT-13198
We could try:
@Suppress("INVISIBLE_MEMBER", "INVISIBLE_REFERENCE")
fun <T> KProperty0<T>.f(value: @kotlin.internal.NoInfer T)
This enforces T to come from the KProperty, not the value.
This overload is deprecated