dataframe
dataframe copied to clipboard
Generation of Data Schemas from Json could be better
Let's say I have the following sample data from the Magical Location Clock app
{
"0": {
"name": "Kwijt 🤷",
"people": [
"Ronald",
"Jolan"
],
"peopleIds": [
1544198659362550,
1556728145307124
]
},
"164468523846839": {
"name": "Joyce ❤️",
"people": [],
"peopleIds": [],
"gps": {
"latitude": 59.5391564418419,
"longitude": 8.09191055862616
}
},
"1644685267012957": {
"name": "Joyce ❤️",
"people": [],
"peopleIds": [],
"gps": {
"latitude": 52.19508702169525,
"longitude": 1.414078822875235
}
},
"1659257255454626": {
"name": "Camping 🏕️",
"people": [
"Pascale"
],
"peopleIds": [
1545419932562464
],
"gps": {
"latitude": 43.1857803898036,
"longitude": 2.1369805421961723
}
}
}
Then the gradle plugin generates:
@DataSchema(isOpen = false)
interface Mlc1 {
val name: String
val people: kotlin.collections.List<kotlin.String>
val peopleIds: kotlin.collections.List<kotlin.Long>
}
@DataSchema(isOpen = false)
interface Mlc3 {
val latitude: Double
val longitude: Double
}
@DataSchema(isOpen = false)
interface Mlc4
@DataSchema(isOpen = false)
interface Mlc2 {
val gps: DataRow<Mlc3>
val name: String
val people: DataFrame<Mlc4>
val peopleIds: DataFrame<Mlc4>
}
@DataSchema(isOpen = false)
interface Mlc5 : Mlc1 {
val gps: DataRow<Mlc3>
}
@DataSchema
interface Mlc {
@ColumnName("0")
val `0`: DataRow<Mlc1>
@ColumnName("164468523846839")
val `164468523846839`: DataRow<Mlc2>
@ColumnName("1644685267012957")
val `1644685267012957`: DataRow<Mlc2>
@ColumnName("1659257255454626")
val `1659257255454626`: DataRow<Mlc5>
public companion object {
public const val defaultPath: kotlin.String = "src/main/resources/mlcTest.json"
public fun readJson(path: kotlin.String = defaultPath, verify: kotlin.Boolean? = null): org.jetbrains.kotlinx.dataframe.DataFrame<Mlc> {
val df = DataFrame.readJson(path, )
return if (verify != null) df.cast(verify = verify) else df.cast()
}
}
}
Some things are curious here:
-
Mlc5
could be the same asMlc1
. -
people
generates either as an empty dataframe or a list (Not sure why not just a List<Nothing> or Any?) - lack of an attribute like
gps
could be treated by making it nullable
Alternatively, supporting something like OpenAPI https://github.com/Kotlin/dataframe/issues/142 could help prevent these issues.
So as it turns out, an empty json array is turned into an empty dataframe, while a non empty json array is turned into a normal list. Kinda odd isn't it?
tackled in https://github.com/Kotlin/dataframe/pull/173