[Improvement][Spark]: Neo4j2GraphAr change the type
Is there an existing issue for this?
- [X] I have searched the existing issues
Current Behavior
The former data type is Long in neo4j. But after importer, it becomes string when there is a null.
Just like this.
Expected Behavior
Fix it.
Minimal Reproducible Example
Maybe this.
Environment
- Operating system: ubuntu
- GraphAr version: latest
Link to GraphAr Logs
No response
Further Information
No response
When do import, the first warnning log maybe a key infomation as the image shows.
Maybe we can use the APOC procedure, but it only available in enterprise edition
And anther solution is change the option schema.flatten.limit to 1(default is 10). I change the code https://github.com/alibaba/GraphAr/blob/fe4ebb9b2dbcf30e63cd3895a51d4d614fe80df3/spark/src/main/scala/com/alibaba/graphar/example/Neo4j2GraphAr.scala#L75-L78 to
val person_df = spark.read
.format("org.neo4j.spark.DataSource")
.option("schema.flatten.limit", 1)
.option("query", "MATCH (n:Person) RETURN n.name AS name, n.born as born")
.load()
and it works, the schema of person_df turns to be
root
|-- name: string (nullable = true)
|-- born: long (nullable = true)
And I think this is not the bug of GraphAr, so I change the title and label to improvement.