neo4j-jdbc icon indicating copy to clipboard operation
neo4j-jdbc copied to clipboard

Testing in Rapidminer

Open whitfa99 opened this issue 9 years ago • 14 comments

Using jdbc Release 3.0.1 on Neo4j 3.0.5, using the (GUI) Read Database operator within Rapidminer. Connection string is jdbc:neo4j:bolt://localhost:7687/?debug=false,noSsl,flatten=1000

The resultset for this simple cypher statement looks promising:

Match (n:mynode) WHERE NOT n.aninteger is NULL RETURN n.aninteger

..however it fails when I comment out the second line:

org.neo4j.driver.v1.exceptions.value.Uncoercible: Cannot coerce NULL to Java double

However, it can handle null values of a sring field just fine in the output.

Also, if I try just this:

Match (n:mynode) RETURN n

Then I get: org.neo4j.driver.v1.exceptions.value.LossyCoercion: Cannot coerce FLOAT to Java float without losing precision

Although, I'm not yet sure which field it's referring to here.

Also, where the Neo4j browser would happily return arrays, the driver doesn't (the field is just null).

whitfa99 avatar Sep 13 '16 16:09 whitfa99

Hi @whitfa99, I'll investigate and ping you back soon. It'd greatly simplify our work if you could create a test graph that certifies the problem.

inserpio avatar Sep 13 '16 19:09 inserpio

Also could you please send us the Java code so that we can see the way you're using the driver?

inserpio avatar Sep 13 '16 19:09 inserpio

I can reproduce the first error using the domain created by this:

UNWIND [{name:'A', aninteger:1}, {name:'B',aninteger:2}, {name:'C',aninteger:null}, {name:'D',aninteger:4}] AS properties CREATE (n:mynode) SET n = properties RETURN n

Interestingly, while reproducing this for you I first only created the 'aninteger' property, which produces the expected result when using:

Match (n:mynode) WHERE NOT n.aninteger is NULL RETURN n.aninteger

,but when you exclude the 2nd line here the result set is

n.aninteger null null null null

If instead you create the nodes with both 'name' and 'aninteger' properties and retry the same then it throws the Cannot coerce NULL to Java double error.

As far as the java code is concerned I'm not entirely sure because I've only added this project's jar file (neo4j-jdbc-driver-3.0.1.jar) to the classpath of the Rapidminer application and utilised it's GUI to create the jdbc connection as described here.

Full Disclosure: I am using Rapidminer 5.31, which was the last fully open source version (imho). I don't imagine it's changed much though. Also, I ended up deleting com.fasterxml.* from the jar because it was meddling with a v2.4.0 of the same package elsewhere in the classpath.

If that last bit has caused the anomaly I will gladly apologise and go crawl back under my rock.

whitfa99 avatar Sep 13 '16 20:09 whitfa99

Thanks for the additional info @whitfa99.

com.fasterxml.* is used for the HTTP protocol. If you just need to use the BOLT one (which is faster) then you could bring the protocol-specific jar (please have a look at the documentation here). The jar is downloadable here. This can save you from loading the all-protocols driver neo4j-jdbc-driver-3.0.1.jar and removing unwanted pieces inside it.

I'm keen to think that your config should be ok, but I still need to have a look at Rapidminer (I have never used it). In the meantime, if it sounds reasonable for you, you can keep the ResultSet clean either by adding the WHERE NOT n.aninteger is NULL or by coalescing the result like this RETURN coalesce(n.aninteger, 0).

Lorenzo

inserpio avatar Sep 13 '16 22:09 inserpio

Thanks for the tip Lorenzo. The domain I'm using in this thread is just for reproducing the error. I can't do without those nulls :) Perhaps if it's feasible, I'll hold out for a fix, assuming it's also something that affects other people. I can see that Knime is also mentioned elsewhere here and Rapidminer is in the same space.

I'm looking to replace my existing neo4j>rapidminer method, which currently involves me using the http REST interface and then a rather slow XML deserialiser. Hoping to replace with something less convoluted and more "Bolty-er" :)

whitfa99 avatar Sep 13 '16 22:09 whitfa99

I think the problem is that the "int" type is primitive and doesn't allow for nulls at all.

We would have to coerce it to 0 then instead of null, but not sure how common that case is. Not sure what rapidminer uses internally to access the fields.

perhaps if you can get any access to stack traces / logs of rapidminer that would be helpful.

Not sure how that's solved in SQL.

Did you try return toString(n.aninteger)

jexp avatar Sep 14 '16 08:09 jexp

I configured (for the very first time Rapidminer) like that:

  1. Defined neo4j driver params + uploaded the jar file:
manage database drivers
  1. Defined a connection to a specific instance of neo4j (localhost):
manage database connection
  1. Defined a very basic Rapidminer process that executes a cypher query and automatically print the result:
rapidminer process
  1. Defined the query to be executed match (n:mynode) return coalesce(n.aninteger, 0):
cypher query
  1. Run the process:
process execution result

So coalescing could be a temporary workaround...

If I remove the coalesce like this match (n:mynode) return n.aninteger, I get the above mentioned exception:

screen shot 2016-09-14 at 23 13 15

For completeness, we have to implement (as it's in the roadmap) some metadata methods for schema, because I'm observing this exception also:

screen shot 2016-09-14 at 23 15 58

@whitfa99 I can rapidly provide you a patch for that, but a new 3.1 release is planned for half if October. This mean I'll give you a snapshot jar.

inserpio avatar Sep 14 '16 21:09 inserpio

That would be fantastic Lorenzo. Thanks for your suggestions guys.

whitfa99 avatar Sep 15 '16 08:09 whitfa99

Hey @whitfa99,

can you try to replace the 3.0.1 driver jar with this 3.1-SNAPSHOT version and see if it sounds good for you? https://github.com/neo4j-contrib/neo4j-jdbc/releases/download/3.1-SNAPSHOT/neo4j-jdbc-bolt-3.1.0-SNAPSHOT.jar

Let me know! Thanks, Lorenzo

inserpio avatar Sep 15 '16 10:09 inserpio

Thank you - this looks promising! I no longer get the Cannot coerce NULL to Java double error. The fields/attributes/columns are being typed/cast correctly as far as I can see, with the exception of any column which has all null values for the returned rows, whereas in my version of Rapidminer at least, it then defaults(?) to nominal (string).

I am however getting a NPE when the recordset should otherwise return empty rows such as:

Match (n:mynode) WHERE n.aninteger = 5 RETURN n.aninteger, n.name

Also, I've run some other queries and I'm seeing some odd behaviour. Is there perhaps some code that samples X rows and decides that if a column has been null then it should be for the whole dataset? That's what I'm guessing. I have simplified one of my queries like this (you don't have the domain for this):

Match (n:mynode)-[r]-() WHERE NOT r.adouble is NULL RETURN r.adouble

Which returns 299 rows of populated values with Rapidminer typing it as Real. If I comment out the WHERE statement, then I get 536 rows of nulls (299 of them should not be!), with the Type now being Nominal. If I run the same statement in the Neo4j Browser, then I get all 536 rows including 299 populated values, the rest are null, the first 190 of these nulls are in the first rows.

If I then change the order in the statement in Rapidminer like this:

Match (n:mynode)-[r]-() RETURN r.adouble ORDER BY id(r)

Then I get the 299 populated values, plus the 237 nulls = 536 records as expected (the same as the output in Neo4j Browser). Presumably, this is because the first 190 field values are no longer all null.

Apart from that, all looks good and runs very quickly.

Also, are arrays in plan to be handled in a similar way as the browser? I don't currently get any output of them using the driver in this way.

Adrian

whitfa99 avatar Sep 15 '16 14:09 whitfa99

@whitfa99 if you could share a minimal dataset to work on together then it'd be great.

inserpio avatar Sep 15 '16 18:09 inserpio

This dataset also works:

WITH REDUCE (Build=[], n in range(1,500)|Build+{NodeNo:n,adouble:CASE WHEN n>200 THEN n*0.01 END}) as mapcollection UNWIND mapcollection AS map CREATE (n:mynode) SET n = map RETURN n

If you then run this in Rapidminer:

Match (n:mynode) RETURN n.NodeNo,n.adouble ORDER BY n.NodeNo DESC

Then you get what you expect, however if you run this:

Match (n:mynode) RETURN n.NodeNo,n.adouble ORDER BY n.NodeNo ASC

Then you only get null values for n.adouble

whitfa99 avatar Sep 15 '16 19:09 whitfa99

Great, thank you. Let me try.

inserpio avatar Sep 15 '16 19:09 inserpio

This works correctly in version 6, see

try (var result = stmt.executeQuery("MATCH (n:mynode) RETURN n.aninteger ORDER BY n.order_prop ASC")) {
	assertThat(result.next()).isTrue();
	assertThat(result.getInt(1)).isEqualTo(42);
	assertThat(result.next()).isTrue();
	assertThat(result.getInt(1)).isZero();
	assertThat(result.wasNull()).isTrue();
	assertThat(result.next()).isFalse();
}

I cannot test in RapidMiner, as the latest version of the product only works with a quite dated JDK 11, therefor I keep this issue open.

michael-simons avatar Mar 01 '24 10:03 michael-simons