cypher-for-gremlin icon indicating copy to clipboard operation
cypher-for-gremlin copied to clipboard

Question: translation result cannot run on Janus Graph

Open yuzhebj opened this issue 5 years ago • 15 comments

First thanks for your contributions for this useful tool.

I translated cypher with parameters to gremlin script using this tool.

Cypher: MATCH (arch:Architecture{_id: {archId}}) RETURN arch AS arch

Translated gremlin script: g.V().as('arch').hasLabel('Architecture').where(__.choose(__.constant(archId), __.constant(archId), __.constant(' cypher.null')).is(neq(' cypher.null')).as(' GENERATED1').select('arch').values('_id').where(eq(' GENERATED1'))).select('arch').project('arch').by(__.choose(neq(' cypher.null'), __.valueMap(true)))

This gremlin script can run successfully on Neo4j. But after I switched db to JanusGraph, I cannot get result and got this error. image

I commit the gremlin script using following code in JavaScript. client.submit(gremlinScript, params)

And I also created index for _id property. image

Could anyone help with this issue? And what is the best practice for this kind of requirements?

yuzhebj avatar Jul 22 '19 08:07 yuzhebj

Hello @yuzhebj,

this behavior is related to parameter handling.

To avoid NullPointerException in case parameter has null value, parameter is wrapped into null guard .choose(constant(archId), constant(archId), constant(' cypher.null')). Thats why Gremlin can not fold .has('_id', ...) in .V() step to benefit from index and query is slow.

I agree that this translation is not optimal (although correct) and parameter validation should be configurable.

Until this is implemented, I can only suggest trying a workaround with manual inlining of parameters e.g: MATCH (arch:Architecture{_id: 2}) RETURN arch AS arch.

dwitry avatar Jul 23 '19 06:07 dwitry

Hello @dwitry , thanks very much for your reply.

I remove the parameters, and it works. Also the gremlin script is much shorter than before. 👍

But I still wonder whether there is some static variable or parameter can replace the 'cypher.null' in gremlin?

In addition, in my translation result, I have script like this by(__.constant('_id')).select(values).map(cypherContainerIndex())

and get error image

I cannot find any document about it. How can I rewrite this gremlin script? (I have refined my cypher script in several different ways, but still have this cypherContainerIndex() function in gremlin result)

yuzhebj avatar Jul 23 '19 07:07 yuzhebj

But I still wonder whether there is some static variable or parameter can replace the 'cypher.null' in gremlin?

Not at the moment. If you will use Java null, the traversal will fail with NPE. This might be improved in TinkerPop 3.5.

(I have refined my cypher script in several different ways, but still have this cypherContainerIndex() function in gremlin result

cypherContainerIndex() is custom function that allows accessing elements by index when type is unknown (refer to docs for more detailed explanation).

It is included in Extensions and Server Plugin.

Have you installed Server Plugin on your target Gremlin Server?

dwitry avatar Jul 23 '19 08:07 dwitry

@yuzhebj can you show Cypher query that translates with cypherContainerIndex() and produces this error?

dwitry avatar Jul 23 '19 08:07 dwitry

@dwitry I do have installed server plugin on my gremlin server for this extension.

I use java project to do the translation task offline (I have added CustomPredicate.java and CustomFunctions.java to my project, version of gremlin translation library is 0.9.13, because the latest version 1.0.2 has some feature not supported in JanusGraph).

The version of gremlin server is 3.4.1. I have installed the extension by bin/gremlin-server.sh install org.opencypher.gremlin cypher-gremlin-server-plugin

And I execute gremlin script in a nodejs project.

This is the cypher: MATCH (instance:TEST {_id: 'instanceId_PARAM'}) OPTIONAL MATCH (instance)-[rel *]->(node) WITH *, CASE WHEN node IS NOT NULL THEN {start: startNode(LAST(rel)), end: endNode(LAST(rel)), type: type(LAST(rel))} ELSE NULL END AS relTmp RETURN COLLECT(DISTINCT {start: relTmp.start._id, end: relTmp.end._id, type: relTmp.type}) AS rels

And this is the translation result: g.V().as('instance').hasLabel('TEST').has('_id', eq('instanceId_PARAM')).choose(__.select('instance').as(' cypher.path.start.GENERATED3').repeat(__.outE().as('rel').inV()).emit().until(__.path().from(' cypher.path.start.GENERATED3').count(local).is(gte(21))).as('node').optional(__.select(all, 'rel').as('rel')), __.select('instance').as(' cypher.path.start.GENERATED3').repeat(__.outE().as('rel').inV()).emit().until(__.path().from(' cypher.path.start.GENERATED3').count(local).is(gte(21))).as('node').optional(__.select(all, 'rel').as('rel')), __.constant(' cypher.null').as('rel').as('node')).where(__.choose(__.select('node').choose(__.is(neq(' cypher.null')), __.constant(true), __.constant(false)).is(eq(true)), __.project('start', 'end', 'type').by(__.select('rel').choose(__.tail(local, 1), __.tail(local, 1), __.constant(' cypher.null')).choose(neq(' cypher.null'), __.outV())).by(__.select('rel').choose(__.tail(local, 1), __.tail(local, 1), __.constant(' cypher.null')).choose(neq(' cypher.null'), __.inV())).by(__.select('rel').choose(__.tail(local, 1), __.tail(local, 1), __.constant(' cypher.null')).choose(neq(' cypher.null'), __.label().is(neq('vertex')))), __.constant(' cypher.null')).is(neq(' cypher.null'))).select('instance', 'node', 'rel').project('instance', 'node', 'rel', 'relTmp').by(__.select('instance')).by(__.select('node')).by(__.select('rel')).by(__.choose(__.select('node').choose(__.is(neq(' cypher.null')), __.constant(true), __.constant(false)).is(eq(true)), __.project('start', 'end', 'type').by(__.select('rel').choose(__.tail(local, 1), __.tail(local, 1), __.constant(' cypher.null')).choose(neq(' cypher.null'), __.outV())).by(__.select('rel').choose(__.tail(local, 1), __.tail(local, 1), __.constant(' cypher.null')).choose(neq(' cypher.null'), __.inV())).by(__.select('rel').choose(__.tail(local, 1), __.tail(local, 1), __.constant(' cypher.null')).choose(neq(' cypher.null'), __.label().is(neq('vertex')))), __.constant(' cypher.null'))).as(' GENERATED4').select('instance').as('instance').select(' GENERATED4').select('node').as('node').select(' GENERATED4').select('rel').as('rel').select(' GENERATED4').select('relTmp').as('relTmp').project('start', 'end', 'type').by(__.project(' GENERATED5', ' GENERATED6').by(__.project(' GENERATED7', ' GENERATED8').by(__.identity()).by(__.constant('start')).select(values).map(cypherContainerIndex())).by(__.constant('_id')).select(values).map(cypherContainerIndex())).by(__.project(' GENERATED9', ' GENERATED10').by(__.project(' GENERATED11', ' GENERATED12').by(__.identity()).by(__.constant('end')).select(values).map(cypherContainerIndex())).by(__.constant('_id')).select(values).map(cypherContainerIndex())).by(__.project(' GENERATED13', ' GENERATED14').by(__.identity()).by(__.constant('type')).select(values).map(cypherContainerIndex())).dedup().is(neq(' cypher.null')).fold().project('rels').by(__.identity())

yuzhebj avatar Jul 23 '19 09:07 yuzhebj

version of gremlin translation library is 0.9.13, because the latest version 1.0.2 has some feature not supported in JanusGraph)

Just to note that JanusGraph 0.4.0, is compatible with 1.0.0 and even 1.0.2 if you don't need Spark and OLAP capabilities.

And this is the translation result:

So cypherContainerIndex() index comes from relTmp.start._id: frontend does not know the type of relTmp so extraction of start is done in runtime via custom function. Same goes for extracting _id from start.

Considering you've installed the plugin, error message about missing signature of cypherContainerIndex() looks very strange. Does any other custom functions like toInteger or toString work?

If not, may I suggest you to try to import CustomFunctions explicitly in Gremlin Server configuration:

scriptEngines: {
  gremlin-groovy: {
    plugins: {  ...  },
    staticImports: ['org.opencypher.gremlin.traversal.CustomFunctions.*']
...

dwitry avatar Jul 23 '19 12:07 dwitry

@dwitry Thanks for your reply.

I updated Gremlin Server configuration like what you suggested and restart the gremlin server. But I still have this error... I also tried to install the server plugin in the gremlin server in JanusGraph 0.4.0. And also failed...

I also followed the manual steps and I can see [INFO] OpLoader - Adding the cypher OpProcessor. after started gremlin-server... Is there any other suggestion?

yuzhebj avatar Jul 25 '19 02:07 yuzhebj

@yuzhebj can you clarify if any other custom functions like toInteger or toString work?

dwitry avatar Jul 26 '19 07:07 dwitry

@dwitry I tried toString function, and it does NOT work via both gremlin console and nodejs client.

The cypher is MATCH (arch:Architecture {_id:'archId'}) RETURN toString(arch.number)

And the gremlin script is g.V().hasLabel('Architecture').has('_id', eq('archId')).project('toString(arch.number)').by(__.choose(neq(' cypher.null'), __.choose(__.values('number'), __.values('number'), __.constant(' cypher.null'))).map(cypherToString()))

I also got this error image

I followed this guide to install the gremlin extension (both the automated and manual installation parts)

yuzhebj avatar Jul 26 '19 08:07 yuzhebj

Ok, still sounds very strange to me, need more details to find cause. Only thing I can suggest is that you post your Gremlin Server configuration yaml here, and I'll try to reproduce this on Monday.

dwitry avatar Jul 26 '19 08:07 dwitry

@dwitry this is my gremlin server config yaml. And thanks very much! gremlin-server-yaml.zip

yuzhebj avatar Jul 26 '19 09:07 yuzhebj

@yuzhebj could you try to add org.opencypher.gremlin.server.jsr223.CypherPlugin: {} to scriptEngines.gremlin-groovy.plugins in your yaml file?

dwitry avatar Jul 29 '19 09:07 dwitry

@dwitry I think it works. I will try more gremlin scripts. Thanks so much! 👍👍

yuzhebj avatar Jul 30 '19 02:07 yuzhebj

@dwitry regarding this comment is it possible to disable parameter validation in current v1.0.4 release? Could the Translator.ParametrizedFlavorBuilder.inlineParameters() be useful here?

sbespalov avatar Nov 19 '19 01:11 sbespalov

Also, regarding the inlining parameters workaround, this make possible SQL injection vulnerability

sbespalov avatar Nov 19 '19 02:11 sbespalov