incubator-graphar icon indicating copy to clipboard operation
incubator-graphar copied to clipboard

[Improvement][Spark]: Neo4j2GraphAr change the type

Open jiajunly opened this issue 2 years ago • 4 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Current Behavior

d8cf11b60af5db47e4a6d9f7af4955c8

The former data type is Long in neo4j. But after importer, it becomes string when there is a null.

Just like this.

Expected Behavior

Fix it.

Minimal Reproducible Example

Maybe this.

Environment

  • Operating system: ubuntu
  • GraphAr version: latest

Link to GraphAr Logs

No response

Further Information

No response

jiajunly avatar Jan 26 '24 10:01 jiajunly

image When do import, the first warnning log maybe a key infomation as the image shows.

jiajunly avatar Jan 26 '24 11:01 jiajunly

Maybe we can use the APOC procedure, but it only available in enterprise edition

acezen avatar Jan 31 '24 12:01 acezen

And anther solution is change the option schema.flatten.limit to 1(default is 10). I change the code https://github.com/alibaba/GraphAr/blob/fe4ebb9b2dbcf30e63cd3895a51d4d614fe80df3/spark/src/main/scala/com/alibaba/graphar/example/Neo4j2GraphAr.scala#L75-L78 to

    val person_df = spark.read
      .format("org.neo4j.spark.DataSource")
      .option("schema.flatten.limit", 1)
      .option("query", "MATCH (n:Person) RETURN n.name AS name, n.born as born")
      .load()

and it works, the schema of person_df turns to be

root
 |-- name: string (nullable = true)
 |-- born: long (nullable = true)

acezen avatar Jan 31 '24 12:01 acezen

And I think this is not the bug of GraphAr, so I change the title and label to improvement.

acezen avatar Feb 01 '24 09:02 acezen