morpheus
morpheus copied to clipboard
Unable to read any data using custom schemas
I am unable to read any data from Neo4j using this library and a custom schema.
Here is how I am setting up the connection:
object ExtractUsernames extends MorpheusApp {
val neo4j = connectNeo4j("bolt://xx.xx.xx.xx:7687", "neo4j", "xxxxxxxxx")
implicit val morpheus: MorpheusSession = MorpheusSession.local()
val schemaFile = Source.fromFile(getClass.getResource("/schema.json").getPath).getLines.mkString
val schema = PropertyGraphSchema.fromJson(schemaFile)
private val datasource = GraphSources.cypher.neo4j(neo4j.config, Some(schema))
morpheus.registerSource(Namespace("Neo4j"), datasource)
}
And this is what our schema.json looks like:
{
"version": 1,
"labelPropertyMap": [
{
"labels": [
"User"
],
"properties": {
"username": "STRING"
}
}
],
"relTypePropertyMap": []
}
This is the query I am trying to run:
val result = morpheus.cypher(
s"""
|FROM Neo4j.graph
|MATCH (i:User)
|RETURN i.username
""".stripMargin)
result.show
However, when I run the query, I get an empty table:
╔════════════╗
║ i.username ║
╚════════════╝
(no rows)
I have verified that:
- [x] The schema file can be read.
- [x] Neo4j is accessible and the port is open to the machine running this code.
- [x] There is a graph called Neo4j.graph available in
morpheus.catalog.graphNames
- [x] The same error is present in atleast 2 versions of Neo4j: 3.2.0 and 3.4.1
Morpheus version: 0.4.0
Spark version: 2.4.3
Neo4j version: 3.2.0
and 3.4.1
Does anyone have any idea as to what is going on?
Does the team need me to add any more details? @s1ck
I could not reproduce the error on master, Spark 2.4.3 and Neo4j 3.4.1.
Does the query return any results when executed in Neo4j, e.g. via browser? Maybe you misspelled the label (lower/uppercase)?
AFAIK there were no recent changes to the Neo4j data source, but you could still try Morpheus 0.4.1.
I ran the same query on the browser and it works fine.
Did you try your test with a custom schema?
Yes, I did the exact same experiment.
Closing this as not reproducible. Please reopen if this is still an issue.
@Mats-SX Can the custom schema be a subset of the actual graph schema? I still can't get it to work, even after updating to 0.4.2 of morpheus
@Mats-SX I figured out that this happens when nodes have more than one label on them.
You can reproduce this by adding another label, say :Offline
to every :User
, without adding this new label to the schema.
I found a query being run which is related to this:
MATCH (e:`User`)
WHERE length(labels(e)) = 1
RETURN id(e) AS ___morpheusID, e.username
Why does morpheus check for the number of labels on a node?
Ah, that makes sense! That's an interesting idea. I guess at this moment no, we don't allow the schema to be a subset of the actual schema, but I can definitely see why you would be interested in that sort of feature. I don't see directly why we wouldn't allow that, but I'll need to discuss it with the team.
Nice find, thank you!
Great, thanks for the update. Since Neo4j doesn't enforce a strict schema it makes sense to let users decide the properties they care about. On your end, you can simply reject the spark job if there is a mismatch between the data being read and the schema it expects
Actually there is a way one could achieve this.
The problem is about how the Morpheus schema represents nodes internally. Nodes are grouped by their label set, e.g. :User:Offline
and :User:Online
. The JSON representation uses the same separation.
So if you want to query for all users, you have to describe every existing label combination with :User
in the schema:
{
"version": 1,
"labelPropertyMap": [
{
"labels": [
"User",
"Offline"
],
"properties": {
"username": "STRING"
}
},
{
"labels": [
"User",
"Online"
],
"properties": {
"username": "STRING"
}
}
],
"relTypePropertyMap": []
}
This will then allow you to query for every existing :User
just as you did above
FROM Neo4j.graph
MATCH (i:User)
RETURN i.username
This is of course not ideal, but will allow you to accomplish the job.