dataframe
dataframe copied to clipboard
Support Multiplatform
Most of the library code is common, exceptions are IO parts and Jupyter integration. We may support KMP (at least K/JS) for this library
There's probably too much JVM reflection going on for this to be easy, let alone viable :/
Any Updates ?
@icecreamparlor Nope. While it would be cool, there are just so many JVM dependencies in the project right now, so while in theory, it should be possible, it would be a huge undertaking.
If performance would be the reason to go multiplatform, I think we still have a lot to gain when the Vector API hits the JVM eventually, plus we have plans to convert our Lists to primitive arrays eventually https://github.com/Kotlin/dataframe/issues/30.
If, aside from performance, there are other needs for multiplatform support, I would be interested in seeing a proof of concept of (part of) the API, so we can then properly decide whether it would be worth the effort or not.
-
to support multiplatform, needed to switch to kotlinx.serialization - #312 It would be nice if the part with the parser was separated into a separate module
-
also appeared kotlinx-io - https://github.com/Kotlin/kotlinx-io
Doubts arose about multiplatform support in dataframe, because the library uses a lot of reflections.
I've looked into this a bit, most of the reflection we use is in common. Therefore, if there are problems, then in isolated cases https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.reflect/
If you look at implementation files like this:
https://github.com/Kotlin/dataframe/blob/master/core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/impl/TypeUtils.kt you can see we use jvmErasure
a lot all over the place. I'm not sure if there's a common alternative for that. This needs to be checked.
In fact, this is the only thing I noticed that is strongly tied to the platform.
In this case, for jvm everything will be the same, for native it will be possible to come up with a workaround, with wasmJs I’m not sure if it’s worth supporting at all
Gradle & KSP plugins need to be tested in multiplatform projects when the library is ready
- ImportDataSchema annotation usage
-
dataframes { }
Gradle configutation usage Should be a big deal, so treat is as a note for future for testing purposes. Not a blocker or anything
I took a closer look at multiplatform support and conducted some experiments with it. Initially, I made some erroneous conclusions.
Here are the issues I discovered:
- Support for generated code and proper source configuration
- DataFrame plugin - there is support for different targets in ksp.
- Plugins working with code or kdoc - such as
korro
anddocProcessor
. In the case ofkorro
, it theoretically will work, as it simply finds a function and processes the function body, in the case of common code there should be no problems. WithdocProcessor
, I don't know, as the structure will change, surely changes will be needed for it too. - Jupyter - it's necessary to separate the logic related to notebooks and Jupyter and move it to the jvm part
- kotlinpoet - it's multiplatform, but only supports the jvm target
- Java types - the code uses
java.time
,BigDecimal
,java.io
,java.util
,serializable
. Some of this is solvable, for example, with support forkotlinx-datetime
,kotlinx-io
orokio
.BigDecimal
and its handling need to be moved to a jvm module. The rest needs further exploration. -
csv
,tsv
,json
- all our reading is jvm dependent - jdbc)
- concurrent - using atomic should solve the problem
- There were problems with calculating the types of receivers
- Java reflection — there's a small amount of Java reflection. Perhaps it can be replaced by
kotlin.reflect.jvm
, which is also tied to the jvm) - Kotlin reflection — the code uses JVM reflection in many places. For example:
- isAbstract
- KVisibility
- kotlin.reflect.jvm
- jvmErasure
- isAccessible
- javaField
- kotlin.reflect.full
- isSubclassOf
- isSubtypeOf
- isSuperclassOf
- withNullability
- findAnnotation
- hasAnnotation
- memberProperties
- allSuperclasses
- createType
- primaryConstructor
This is used in methods, for example: convert
, update
, join
, aggregate
, and others. But it's also used in TypeUtils
, and methods from there are called when creating new columns, that is, practically with any operation. There's no simple replacement or implementation through expect
/actual
for the same jvmErasure
in Kotlin/Native, so a full refactoring of this logic is required. In some cases, I assume the use of reflection is excessive, and in the case of type erasure, I see the following. The simplest way is to go through the data in Kotlin/Native and calculate the type, but this carries very large overheads. Another option is to calculate the type when the data comes from outside, that is, when creating a dataframe, keep it all the time and reuse it constantly, as it does not always happen now. When calculating a new type during operations with the dataframe, a resolver is necessary, and this will require implementing quite complex logic. Also, I assume that some problems with reflection will be solved with the help of a compiler plugin.
As a result, I do not see multiplatform support with Kotlin/Native (ios) as feasible in the near future, as it requires a lot of effort, which presumably could be solved by Kotlin itself in the future. Multiplatform support only for JVM and Android seems like a more realistic task but will require:
- project configuration
- code refactoring, to separate logic. So that in the common module there would ideally be very limited but isolated code
- support for tasks such as #312, #587 , support kotlinx-io