morpheus icon indicating copy to clipboard operation
morpheus copied to clipboard

Unable to read any data using custom schemas

Open swayam18 opened this issue 5 years ago • 10 comments

I am unable to read any data from Neo4j using this library and a custom schema.

Here is how I am setting up the connection:

object ExtractUsernames extends MorpheusApp {
  val neo4j = connectNeo4j("bolt://xx.xx.xx.xx:7687", "neo4j", "xxxxxxxxx")
  implicit val morpheus: MorpheusSession = MorpheusSession.local()

  val schemaFile = Source.fromFile(getClass.getResource("/schema.json").getPath).getLines.mkString
  val schema = PropertyGraphSchema.fromJson(schemaFile)
  private val datasource = GraphSources.cypher.neo4j(neo4j.config, Some(schema))

  morpheus.registerSource(Namespace("Neo4j"), datasource)
}

And this is what our schema.json looks like:

{
  "version": 1,
  "labelPropertyMap": [
    {
      "labels": [
        "User"
      ],
      "properties": {
        "username": "STRING"
      }
    }
  ],
  "relTypePropertyMap": []
}

This is the query I am trying to run:

 val result = morpheus.cypher(
    s"""
       |FROM Neo4j.graph
       |MATCH (i:User)
       |RETURN i.username
     """.stripMargin)

  result.show

However, when I run the query, I get an empty table:

╔════════════╗
║ i.username ║
╚════════════╝
(no rows)

I have verified that:

  • [x] The schema file can be read.
  • [x] Neo4j is accessible and the port is open to the machine running this code.
  • [x] There is a graph called Neo4j.graph available in morpheus.catalog.graphNames
  • [x] The same error is present in atleast 2 versions of Neo4j: 3.2.0 and 3.4.1

Morpheus version: 0.4.0 Spark version: 2.4.3 Neo4j version: 3.2.0 and 3.4.1

Does anyone have any idea as to what is going on?

swayam18 avatar May 27 '19 11:05 swayam18

Does the team need me to add any more details? @s1ck

swayam18 avatar May 29 '19 02:05 swayam18

I could not reproduce the error on master, Spark 2.4.3 and Neo4j 3.4.1.

Does the query return any results when executed in Neo4j, e.g. via browser? Maybe you misspelled the label (lower/uppercase)?

AFAIK there were no recent changes to the Neo4j data source, but you could still try Morpheus 0.4.1.

s1ck avatar May 29 '19 07:05 s1ck

I ran the same query on the browser and it works fine.

Did you try your test with a custom schema?

swayam18 avatar May 31 '19 06:05 swayam18

Yes, I did the exact same experiment.

s1ck avatar Jun 10 '19 08:06 s1ck

Closing this as not reproducible. Please reopen if this is still an issue.

Mats-SX avatar Jun 17 '19 08:06 Mats-SX

@Mats-SX Can the custom schema be a subset of the actual graph schema? I still can't get it to work, even after updating to 0.4.2 of morpheus

swayam18 avatar Jul 25 '19 11:07 swayam18

@Mats-SX I figured out that this happens when nodes have more than one label on them. You can reproduce this by adding another label, say :Offline to every :User, without adding this new label to the schema.

I found a query being run which is related to this:

MATCH (e:`User`)
WHERE length(labels(e)) = 1
RETURN id(e) AS ___morpheusID, e.username

Why does morpheus check for the number of labels on a node?

swayam18 avatar Jul 25 '19 12:07 swayam18

Ah, that makes sense! That's an interesting idea. I guess at this moment no, we don't allow the schema to be a subset of the actual schema, but I can definitely see why you would be interested in that sort of feature. I don't see directly why we wouldn't allow that, but I'll need to discuss it with the team.

Nice find, thank you!

Mats-SX avatar Jul 26 '19 07:07 Mats-SX

Great, thanks for the update. Since Neo4j doesn't enforce a strict schema it makes sense to let users decide the properties they care about. On your end, you can simply reject the spark job if there is a mismatch between the data being read and the schema it expects

swayam18 avatar Jul 30 '19 08:07 swayam18

Actually there is a way one could achieve this. The problem is about how the Morpheus schema represents nodes internally. Nodes are grouped by their label set, e.g. :User:Offline and :User:Online. The JSON representation uses the same separation.

So if you want to query for all users, you have to describe every existing label combination with :User in the schema:

{
  "version": 1,
  "labelPropertyMap": [
    {
      "labels": [
        "User",
        "Offline"
      ],
      "properties": {
        "username": "STRING"
      }
    },
    {
      "labels": [
        "User",
        "Online"
      ],
      "properties": {
        "username": "STRING"
      }
    }
  ],
  "relTypePropertyMap": []
}

This will then allow you to query for every existing :User just as you did above

FROM Neo4j.graph
MATCH (i:User)
RETURN i.username

This is of course not ideal, but will allow you to accomplish the job.

DarthMax avatar Jul 30 '19 10:07 DarthMax