dataframe
dataframe copied to clipboard
Add inner / Struct type support in Arrow
Arrow Struct type is read as a Map<String, Any?>
object :
https://github.com/Kotlin/dataframe/blob/86b80e0c9cd372334e8eff05115a7c50b6ea61bc/dataframe-arrow/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/arrowReadingImpl.kt#L171-L173
But write does not support Map Object and by defaut value is serialized as a String
:
https://github.com/Kotlin/dataframe/blob/86b80e0c9cd372334e8eff05115a7c50b6ea61bc/dataframe-arrow/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/arrowTypesMatching.kt#L93-L95
The following test fail because c
column is a LinkedHashMap in a SingletonList on excepted Dataframe and a single String in an Arraylist on readIpc object
@Test
fun testReadIPC(){
val a by columnOf("one")
val b by columnOf(2.0)
val c by listOf(
mapOf(
"c1" to Text("inner"),
"c2" to 4.0,
"c3" to 50.0,
) as Map<String, Any?>
).toColumn()
val d by columnOf("four")
val expected = dataFrameOf(a, b, c, d)
val readIpc = DataFrame.readArrowIPC(expected.saveArrowIPCToByteArray())
readIpc shouldBe expected
}
It could be relevant to add the support of inner type by Writing Map<String,Any?>
in a Struct field.
Some points need to be addressed before implementation :
- Should it support inner of inner... recursively ?
- Should it support only
Map<String, Any?>
? - Struct field is it the better choice as Arrow support also Map type
- Maybe inner type should be specified for the dataframe core and be implemented in a consistent way for all supported types
Originally posted by @fb64 in https://github.com/Kotlin/dataframe/issues/528#issuecomment-1843132618