dataframe icon indicating copy to clipboard operation
dataframe copied to clipboard

Add inner / Struct type support in Arrow

Open fb64 opened this issue 1 year ago • 1 comments

Arrow Struct type is read as a Map<String, Any?> object : https://github.com/Kotlin/dataframe/blob/86b80e0c9cd372334e8eff05115a7c50b6ea61bc/dataframe-arrow/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/arrowReadingImpl.kt#L171-L173

But write does not support Map Object and by defaut value is serialized as a String : https://github.com/Kotlin/dataframe/blob/86b80e0c9cd372334e8eff05115a7c50b6ea61bc/dataframe-arrow/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/arrowTypesMatching.kt#L93-L95

The following test fail because c column is a LinkedHashMap in a SingletonList on excepted Dataframe and a single String in an Arraylist on readIpc object

    @Test
    fun testReadIPC(){
        val a by columnOf("one")
        val b by columnOf(2.0)
        val c by listOf(
            mapOf(
                "c1" to Text("inner"),
                "c2" to 4.0,
                "c3" to 50.0,
            ) as Map<String, Any?>
        ).toColumn()
        val d by columnOf("four")
        val expected =  dataFrameOf(a, b, c, d)
        val readIpc = DataFrame.readArrowIPC(expected.saveArrowIPCToByteArray())
        readIpc shouldBe expected
    }

image

image

It could be relevant to add the support of inner type by Writing Map<String,Any?> in a Struct field. Some points need to be addressed before implementation :

  • Should it support inner of inner... recursively ?
  • Should it support only Map<String, Any?> ?
  • Struct field is it the better choice as Arrow support also Map type
  • Maybe inner type should be specified for the dataframe core and be implemented in a consistent way for all supported types

Originally posted by @fb64 in https://github.com/Kotlin/dataframe/issues/528#issuecomment-1843132618

fb64 avatar Dec 12 '23 13:12 fb64