gremlin-scala icon indicating copy to clipboard operation
gremlin-scala copied to clipboard

Cannot get properties of Elements on remote graph with scalaGraph

Open TitiHl opened this issue 6 years ago • 37 comments

Hi,

Thanks for building this nice wrapper for Scala :D. I am currently use this on a Remote JanusGraph by calling: val scalaGraph: ScalaGraph = EmptyGraph.instance().asScala().configure(_.withRemote(DriverRemoteConnection.using(cluster, "g")))

but found I lost some syntax benefit for gremlin-scala, say, if I want to add an edge between v1 and v2, I can no logger call: val edge = v1 --- ("reference", metadata -> "EdgeTest", deleted -> false) --> v2 Exceptions below:

Edge additions not supported
java.lang.IllegalStateException: Edge additions not supported
	at org.apache.tinkerpop.gremlin.structure.Vertex$Exceptions.edgeAdditionsNotSupported(Vertex.java:175)
	at org.apache.tinkerpop.gremlin.structure.util.reference.ReferenceVertex.addEdge(ReferenceVertex.java:47)
	at gremlin.scala.ScalaVertex.addEdge(ScalaVertex.scala:65)
	at gremlin.scala.SemiEdge.$minus$minus$greater(SemiEdge.scala:4)

I believe this is the cause of EmptyGraph as a underlying graph. referring to this example: https://github.com/mpollmeier/gremlin-scala-examples/blob/master/dse-graph/src/test/scala/SimpleSpec.scala instead I have to call val a = StepLabel[Vertex]() val b = StepLabel[Vertex]() scalaGraph.V(v1.id).as(a).V(v2.id).as(b).addE(REFERENCE).from(a).to(b).property(metadata, "EdgeTest").property(deleted, false).iterate()

this is one of the examples that I cannot use nice wrapper provided by gremlin-scala when I am working on a remote graph, so wondering if i missed sth. here as I am still manipulating on a ScalaGraph or there is a better way to add vertex/edges in remote graph.

Thanks for your help in advance. Alex

TitiHl avatar Nov 27 '17 04:11 TitiHl

and also I found using valueMap with remote graph, I have to pass in valueMap(true) to get the properties using GraphTraversalSource: https://stackoverflow.com/questions/45764199/janusgraph-cluster-always-returns-vertex-without-properties-referencevertex g.V().valueMap(true).toList() but from scalaGraph of: val scalaGraph: ScalaGraph = EmptyGraph.instance().asScala().configure(_.withRemote(DriverRemoteConnection.using(cluster, "g"))) there is no way I can pass valueMap(true). wondering what is the best way to get the properties of Elements using ScalaGraph in this way.

TitiHl avatar Nov 28 '17 03:11 TitiHl

As you mentioned already, EmptyGraph is the problem. Simply use org.janusgraph.core.JanusGraph and everything should be fine.

Did you see https://github.com/mpollmeier/gremlin-scala-examples/ ? It contains a JanusGraph example repo and I just added a line to prove that you can add edges. valueMap works as well, e.g. if you add println(scalaGraph.V.valueMap.toList).

mpollmeier avatar Nov 28 '17 22:11 mpollmeier

Hi, Thanks for your reply. The reason I use EmptyGraph is I am using it to initialise a remote graph, I guess I have to use EmptyGraph to connect to a remote as I found all the docs are using this way unless I missed a way to construct a EmptyGraph of JanusGraph.

I am basically want to achieve exactly the same as the DSE example but with JanusGraph: https://github.com/mpollmeier/gremlin-scala-examples/blob/master/dse-graph/src/test/scala/SimpleSpec.scala somehow the return type of DSE example is DetachVertex while for JanusGraph is ReferenceVertex while I cannot add edges here.

Thanks, Alex

TitiHl avatar Nov 28 '17 23:11 TitiHl

Ok, I understand the issue now. I don't have much capacity to fiddle with this myself, but just had a brief look at Janus' documentation. Have you tried to pass the remote url etc. in the config that you pass to JanusGraphFactory, rather than using EmptyGraph.withRemote? E.g.

storage.backend=cassandra
storage.hostname=localhost

http://docs.janusgraph.org/latest/configuration.html

mpollmeier avatar Nov 29 '17 21:11 mpollmeier

Hi,

the cluster points to the conf file that points the Remote JanusGraph server, while the JanusGraph Server has the all storage backend etc. settings. But you are right, maybe I can specify the backend of JanusGraph explicitly that I can get sth. more than a EmptyGraph. will try this out.

Thanks for your help here. Cheers, Alex

TitiHl avatar Nov 30 '17 07:11 TitiHl

To follow up on this old issue, I believe I have similar problems. Well, similar in that I found I can not use the fancy operators with remote connections (if I do the data is not saved).

I have forked gremlin-scala-examples to show this: I'm using JanusGraph Server as my server here, but whatever it's mostly just Gremlin Server underneath: rwilcox/gremlin-scala-examples JanusGraph example for JanusGraph Server

There are three examples here:

  1. Java copied from Janusgraph's official remote example

  2. an operator ( / structure) based API example with a remote JanusGraph, copied from gremlin-scala janusgraph example

  3. a traversal API based example, where I (poorly!) try to convert the edge / vertices operator API -> traversal API.

(@TitiHl these all use JanusGraphFactory.open("inmemory") instead of EmptyGraph(), which I found too limiting ie not supporting transactions etc etc)

TL; DR:

  • connect to JanusGraph Server: val graph : ScalaGraph = JanusGraphFactory.open("inmemory").configure( _.withRemote( conf) ) line
  • graph + ( "Saturn", Key[String]("name") -> "Saturn" ) line
  • JanusGraph Server data store is unchanged

BUT, I used the addV traversal methods, like so:

  • connect to JanusGraph Server: val graph : ScalaGraph = JanusGraphFactory.open("inmemory").configure( _.withRemote( conf) )
  • graph.addV().property( Key[String]("name"), "Saturn" ).iterate() line
  • JanusGraph Server data store is changed

By going through issue history, I found #118, which has the following comment link - which is one of the reasons why graph.addV() exists at all!

ScalaGraph does have addV (I just called it addVertex). Also note that we have a nicer syntax to add vertices/edges, you might want to use that instead (it's documented on the front page (readme))

I think the difference is that addVertex on a Graph instance does not create a traversal, it operates directly on the Graph where addV operates on the traversal source and creates a traversal.

But methods like + and --- operate on ScalaGraph objects (calling addVertex), not the underlaying traversal source object.

The reason why operating on a graph vs operating on a traversal is important is because it seems to be that the (best? only?) way to connect to Gremlin ... err JanusGraph... Server is via TraversalSource's withRemote method.

@mpollmeier does this logic sound right to you? (I'm a relative newbie to this project and graph / tinkerpop in general)

@TitiHl : I have not tested the original bug with this configuration (JanusGraphFactory.open("inmemory")) vs the other , but that may solve your problem ???

In general, It would be great to somehow have the + or --- operators also work on gremlin.scala.TraversalSource objects, instead of just ScalaGraph objects. (Is there a way to force this??)

rwilcox avatar Feb 01 '18 16:02 rwilcox

Interesting - I'll run this tomorrow and see if I can find a workaround. To make sure we're on the same page: how exactly did you start janusgraph? I'm just downloading janusgraph-0.2.0-hadoop2.zip from https://github.com/JanusGraph/janusgraph/releases/.

The docs suggest to run gremlin.sh and then graph = JanusGraphFactory.open('conf/janusgraph-berkeleyje-es.properties') - is this what you did?

mpollmeier avatar Feb 03 '18 07:02 mpollmeier

Awesome, thanks! Take a look at the JanusGraph Server Getting Started, but TL;DR: use bin/janusgraph.sh start <-- should work out for you

rwilcox avatar Feb 03 '18 18:02 rwilcox

Ok so it turns out that we shouldn't use Graph to add elements, and instead always use the Traversal. This doesn't impact local graphs (one edge case though: the user cannot provide the element id), and is the only way to handle remote graphs, as you had to figure out yourself painfully.

I've made a start to change everything to use a traversal (only for vertices so far) in https://github.com/mpollmeier/gremlin-scala/commit/f74078858954969848caa47d7186a2f767456520 - let me know your thoughts.

So I can actually test this, maybe you help me with the following: when I run your test cases, I get the following error:

- janusgraph server ported Java (from janusgraph-server example) *** FAILED ***
java.util.concurrent.CompletionException: io.netty.handler.codec.DecoderException: 
org.apache.tinkerpop.gremlin.driver.ser.SerializationException: 
org.apache.tinkerpop.shaded.kryo.KryoException: Buffer underflow.

Any ideas what's wrong? Some missing configuration?

Thanks for bringing this back up and providing a nice project to reproduce, @rwilcox

Other random thoughts:

  • you're building up a ClusteredClient but don't actually use it...
  • IMO using janusgraphfactory.open(inmemory) is misleading, it gives you the (false) sense that you can actually use that graph instance. Use EmptyGraph instead

mpollmeier avatar Feb 04 '18 03:02 mpollmeier

Woh, awesome! I'll take a look at the changes probably tomorrow,

(And that buffer underflow error sounds familiar too - I can't place it but I'll check it out at work tomorrow... maybe there it will come to me).

I have answers to your random thoughts now:

you're building up a ClusteredClient but don't actually use it

Yes, in my reading sample code / readings docs / and code provided to me from others on my Current Graph Database Project, I believe the ClusteredClient etc provides JanusGraph specific management features: ie the ability the create indexes to speed up searching, schemas, etc. But only learned this in the last day or so. (And I don't actually do those things in the sample code, yes)

... IMO using janusgraphfactory.open(inmemory) is misleading, it gives you the (false) sense that you can actually use that graph instance. Use EmptyGraph instead

Maybe. What I believe / assume is happening is that creating a traversal off an EmptyGraph will give you only features available in generic Gremlin Server, but basing the traversal off a JanusGraph gives you JanusGraph features.

I'm super interested in what the local graph instance is used for in remote traversal situations: is it just a bootstrap mechanism or is it used somehow ie does it hold a subgraph in memory for cache reasons????? I may go ask the JanusGraph people, as my lead engineer had similar questions (ie if it is used for something like caching, that may have memory implications for mid to large graphs).

rwilcox avatar Feb 04 '18 14:02 rwilcox

It would certainly be a good idea to use the graph instance for some local caching, but I don't think it's doing that, instead it just seems to be a bootstrap for the traversal..

mpollmeier avatar Feb 04 '18 20:02 mpollmeier

@rwilcox any news re the DecoderException? Can you reproduce it?

mpollmeier avatar Feb 05 '18 19:02 mpollmeier

any news re the DecoderException? Can you reproduce it?

No, and my browser history and notes didn't help either :(

rwilcox avatar Feb 05 '18 20:02 rwilcox

'No' as in, if you run the test locally it works, and you don't get that exception? If so, what exactly did you do? I downloaded the 0.2.0-hadoop2 release, unpacked and ran bin/janusgraph.sh -v start, and then ran the test.

mpollmeier avatar Feb 05 '18 20:02 mpollmeier

'No' as in, if you run the test locally it works, and you don't get that exception? If so, what exactly did you do? I downloaded the 0.2.0-hadoop2 release, unpacked and ran bin/janusgraph.sh -v start

Correct - on OS X 10.12 with JAVA_HOME set to a 1.8 JVM, I ran bin/janusgraph.sh -v start then ran my tests one by one in IntelliJ. No error. (Are you using JVM 1.7 or 1.9 maybe???????)

rwilcox avatar Feb 05 '18 22:02 rwilcox

How about if you run it in sbt?

I'm on linux with java 1.8

java -version
openjdk version "1.8.0_144"
OpenJDK Runtime Environment (build 1.8.0_144-b01)
OpenJDK 64-Bit Server VM (build 25.144-b01, mixed mode)

I just freshly unpacked janusgraph and ran sbt test. Output on janusgraph console:

27043 [gremlin-server-worker-1] ERROR org.apache.tinkerpop.gremlin.server.op.traversal.TraversalOpProcessor  - Could not deserialize the Traversal instance

and sbt

*** 3 TESTS FAILED ***

mpollmeier avatar Feb 05 '18 23:02 mpollmeier

Ok, super weird: sbt test gives me the error too. Was not expecting that (given my success with IntelliJ)

rwilcox avatar Feb 06 '18 14:02 rwilcox

Ok, super interesting. In my IntelliJ test configuration there's a checkbox to "use SBT". It was off. When I checked it to be on I got the same error in IntelliJ.

I guess I can see the Scala IntelliJ plugin somehow wanting to bypass sbt for Reasons by default

rwilcox avatar Feb 06 '18 15:02 rwilcox

That's good news, we're getting the same results :) Let me know when you get to the bottom of the error, maybe the working setup with intellij can help? Maybe there's a difference in the classpath?

mpollmeier avatar Feb 06 '18 22:02 mpollmeier

Hey, I'm also interested in this. I got a similar SerializationException running some slightly different code. After some digging I solved it by explicitly specifying the serializer when creating the cluster like so:

private def buildCluster() = {
    val serializer = new GryoMessageSerializerV1d0(GryoMapper.build().addRegistry(JanusGraphIoRegistry.getInstance()))
    val cluster =
      Cluster.build().addContactPoint("localhost").port(45679).serializer(serializer).create()
    cluster
  }

Hope it helps!

alicefuzier avatar Feb 17 '18 14:02 alicefuzier

For what it's worth, I'm running into the same issues trying to connect to a new Amazon Neptune GraphDB Cluster.

val builder: Cluster.Builder = Cluster.build()
  builder.addContactPoint("my-endpoint.amazonaws.com")
  builder.port(8182)
val cluster: Cluster = builder.create()
val graph = EmptyGraph.instance().asScala().configure(_.withRemote(DriverRemoteConnection.using(cluster)))

Gives the same errors: (Empty)Graph does not support adding vertices

apatzer avatar Mar 02 '18 02:03 apatzer

@alicefuzier thanks for sharing, but that didn't fix the exception I'm getting:

io.netty.handler.codec.DecoderException: org.apache.tinkerpop.gremlin.driver.ser.SerializationException: org.apache.tinkerpop.shaded.kryo.KryoException: Buffer underflow.

I don't know much about Janus and it's serialisation unfortunately.

@apatzer that's the error you get when you add a vertex with graph.addV, or graph + someCaseClass. Until this is resolved, the workaround is to add your vertex in a traversal, i.e. using the addV step in GremlinScala. Note: case classes aren't yet supported for that.

mpollmeier avatar Mar 02 '18 03:03 mpollmeier

Ok I just figured out how to connect to janusgraph. Use a different serialiser.

hosts: [localhost]
port: 8182
serializer: {
    className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerGremlinV2d0,
    config: {
        ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry]
    }
}

I'll continue with the changes to allow adding vertices to a remotegraph shortly.

mpollmeier avatar Mar 03 '18 03:03 mpollmeier

I just found some time to dig deeper into this. The underlying problem is that the configuration for remote is not stored in the graph instance, but in the TraversalSource. Because of that, one cannot simply call e.g. vertex.addEdge any more, because that doesn't know about the TraversalSource, and therefor the remote graph. Since IMO the graph instance should hold that information (ScalaGraph does by holding onto the TraversalSource), I decided to add that as an implicit for the arrow DSL. I.e. from now on you need to have an implicit ScalaGraph in scope, then the arrow DSL works fine with remote and local graphs.

I just released gremlin-scala 3.3.1.2 and provided a working example for gremlin-server. I'm still fighting with janusgraph (the basic setup is here), and assume I need to release a new version for 3.3.0, since Janusgraph hasn't released anything for 3.3.1 yet.

mpollmeier avatar Mar 25 '18 08:03 mpollmeier

It looks like I'm running into a similar issue even with 3.3.1.2 when interacting with a Neptune graph.

org.apache.tinkerpop.shaded.kryo.KryoException: Buffer underflow.
val cluster = Cluster.build()
      .addContactPoint(url)
      .port(port)
      .create()
implicit val g = EmptyGraph.instance().asScala
                      .configure(_.withRemote(DriverRemoteConnection.using(cluster, "g")))

object Name extends Key[String]("name")
// this succeeds
g.addV("Node").property(Name, "N/A").valueMap.head()
try {
   //this triggers the error
    g.addV("Node").property(Name, "N/A").head()
} catch {
   // error also occurs for the below expression
    case _: KryoException => g.addVertex("Node", Name.name -> "N/A")
}

Have you found any solutions other than changing the serializer? Neptune does not have such an IORegistry published as far as I can tell

hudsonmd avatar Mar 29 '18 14:03 hudsonmd

Turns out it was user error.. You can modify my above example to add this unmodified serializer and it will function properly

 val cluster = Cluster.build()
      .addContactPoint(url)
      .port(port)
      .serializer(new GraphSONMessageSerializerV3d0())
      .create()

Thanks for the great work bringing gremlin to scala!

hudsonmd avatar Mar 29 '18 18:03 hudsonmd

Quick update re JanusGraph: since it's last release (0.2.0) is still based on tinkerpop 3.2.x I can't backport the new model for handling this in remote graphs, because it relies on GraphTraversal.from(Vertex) which was only introduced in 3.3.x. I'll only provide a working JanusGraph example when they release a new version.

mpollmeier avatar Apr 01 '18 00:04 mpollmeier

A quick update... JanusGraph 0.3.0 was released on July 31, 2018. It now supports tinkerpop 3.3.3.

jeremysears avatar Aug 14 '18 15:08 jeremysears

Finally got around to setting up a remote janusgraph example: https://github.com/mpollmeier/gremlin-scala-examples/blob/fcc048e/janusgraph/src/test/scala/SimpleSpec.scala#L55

I found debugging the serialisers non-straightforward, but here's a setup that works:

val serializer = new GryoMessageSerializerV3d0(GryoMapper.build.addRegistry(JanusGraphIoRegistry.getInstance))
val cluster = Cluster.build.addContactPoint("localhost").port(8182).serializer(serializer).create
implicit val graph = EmptyGraph.instance.asScala.configure(_.withRemote(DriverRemoteConnection.using(cluster)))

mpollmeier avatar Sep 15 '18 12:09 mpollmeier

@mpollmeier Actually, this problem is still actual for Amazon Neptune. At least I have no idea how to initialize connection in a way it worked.

voroninp avatar Sep 15 '18 13:09 voroninp