neo4j-jdbc
neo4j-jdbc copied to clipboard
Testing in Rapidminer
Using jdbc Release 3.0.1 on Neo4j 3.0.5, using the (GUI) Read Database operator within Rapidminer. Connection string is jdbc:neo4j:bolt://localhost:7687/?debug=false,noSsl,flatten=1000
The resultset for this simple cypher statement looks promising:
Match (n:mynode) WHERE NOT n.aninteger is NULL RETURN n.aninteger
..however it fails when I comment out the second line:
org.neo4j.driver.v1.exceptions.value.Uncoercible: Cannot coerce NULL to Java double
However, it can handle null values of a sring field just fine in the output.
Also, if I try just this:
Match (n:mynode) RETURN n
Then I get: org.neo4j.driver.v1.exceptions.value.LossyCoercion: Cannot coerce FLOAT to Java float without losing precision
Although, I'm not yet sure which field it's referring to here.
Also, where the Neo4j browser would happily return arrays, the driver doesn't (the field is just null).
Hi @whitfa99, I'll investigate and ping you back soon. It'd greatly simplify our work if you could create a test graph that certifies the problem.
Also could you please send us the Java code so that we can see the way you're using the driver?
I can reproduce the first error using the domain created by this:
UNWIND [{name:'A', aninteger:1}, {name:'B',aninteger:2}, {name:'C',aninteger:null}, {name:'D',aninteger:4}] AS properties CREATE (n:mynode) SET n = properties RETURN n
Interestingly, while reproducing this for you I first only created the 'aninteger' property, which produces the expected result when using:
Match (n:mynode) WHERE NOT n.aninteger is NULL RETURN n.aninteger
,but when you exclude the 2nd line here the result set is
n.aninteger null null null null
If instead you create the nodes with both 'name' and 'aninteger' properties and retry the same then it throws the Cannot coerce NULL to Java double error.
As far as the java code is concerned I'm not entirely sure because I've only added this project's jar file (neo4j-jdbc-driver-3.0.1.jar) to the classpath of the Rapidminer application and utilised it's GUI to create the jdbc connection as described here.
Full Disclosure: I am using Rapidminer 5.31, which was the last fully open source version (imho). I don't imagine it's changed much though. Also, I ended up deleting com.fasterxml.* from the jar because it was meddling with a v2.4.0 of the same package elsewhere in the classpath.
If that last bit has caused the anomaly I will gladly apologise and go crawl back under my rock.
Thanks for the additional info @whitfa99.
com.fasterxml.* is used for the HTTP protocol. If you just need to use the BOLT one (which is faster) then you could bring the protocol-specific jar (please have a look at the documentation here). The jar is downloadable here. This can save you from loading the all-protocols driver neo4j-jdbc-driver-3.0.1.jar and removing unwanted pieces inside it.
I'm keen to think that your config should be ok, but I still need to have a look at Rapidminer (I have never used it). In the meantime, if it sounds reasonable for you, you can keep the ResultSet clean either by adding the WHERE NOT n.aninteger is NULL or by coalescing the result like this RETURN coalesce(n.aninteger, 0).
Lorenzo
Thanks for the tip Lorenzo. The domain I'm using in this thread is just for reproducing the error. I can't do without those nulls :) Perhaps if it's feasible, I'll hold out for a fix, assuming it's also something that affects other people. I can see that Knime is also mentioned elsewhere here and Rapidminer is in the same space.
I'm looking to replace my existing neo4j>rapidminer method, which currently involves me using the http REST interface and then a rather slow XML deserialiser. Hoping to replace with something less convoluted and more "Bolty-er" :)
I think the problem is that the "int" type is primitive and doesn't allow for nulls at all.
We would have to coerce it to 0 then instead of null, but not sure how common that case is. Not sure what rapidminer uses internally to access the fields.
perhaps if you can get any access to stack traces / logs of rapidminer that would be helpful.
Not sure how that's solved in SQL.
Did you try return toString(n.aninteger)
I configured (for the very first time Rapidminer) like that:
- Defined neo4j driver params + uploaded the jar file:
- Defined a connection to a specific instance of neo4j (localhost):
- Defined a very basic Rapidminer process that executes a cypher query and automatically print the result:
- Defined the query to be executed
match (n:mynode) return coalesce(n.aninteger, 0):
- Run the process:
So coalescing could be a temporary workaround...
If I remove the coalesce like this match (n:mynode) return n.aninteger, I get the above mentioned exception:
For completeness, we have to implement (as it's in the roadmap) some metadata methods for schema, because I'm observing this exception also:
@whitfa99 I can rapidly provide you a patch for that, but a new 3.1 release is planned for half if October. This mean I'll give you a snapshot jar.
That would be fantastic Lorenzo. Thanks for your suggestions guys.
Hey @whitfa99,
can you try to replace the 3.0.1 driver jar with this 3.1-SNAPSHOT version and see if it sounds good for you? https://github.com/neo4j-contrib/neo4j-jdbc/releases/download/3.1-SNAPSHOT/neo4j-jdbc-bolt-3.1.0-SNAPSHOT.jar
Let me know! Thanks, Lorenzo
Thank you - this looks promising! I no longer get the Cannot coerce NULL to Java double error. The fields/attributes/columns are being typed/cast correctly as far as I can see, with the exception of any column which has all null values for the returned rows, whereas in my version of Rapidminer at least, it then defaults(?) to nominal (string).
I am however getting a NPE when the recordset should otherwise return empty rows such as:
Match (n:mynode) WHERE n.aninteger = 5 RETURN n.aninteger, n.name
Also, I've run some other queries and I'm seeing some odd behaviour. Is there perhaps some code that samples X rows and decides that if a column has been null then it should be for the whole dataset? That's what I'm guessing. I have simplified one of my queries like this (you don't have the domain for this):
Match (n:mynode)-[r]-() WHERE NOT r.adouble is NULL RETURN r.adouble
Which returns 299 rows of populated values with Rapidminer typing it as Real. If I comment out the WHERE statement, then I get 536 rows of nulls (299 of them should not be!), with the Type now being Nominal. If I run the same statement in the Neo4j Browser, then I get all 536 rows including 299 populated values, the rest are null, the first 190 of these nulls are in the first rows.
If I then change the order in the statement in Rapidminer like this:
Match (n:mynode)-[r]-() RETURN r.adouble ORDER BY id(r)
Then I get the 299 populated values, plus the 237 nulls = 536 records as expected (the same as the output in Neo4j Browser). Presumably, this is because the first 190 field values are no longer all null.
Apart from that, all looks good and runs very quickly.
Also, are arrays in plan to be handled in a similar way as the browser? I don't currently get any output of them using the driver in this way.
Adrian
@whitfa99 if you could share a minimal dataset to work on together then it'd be great.
This dataset also works:
WITH REDUCE (Build=[], n in range(1,500)|Build+{NodeNo:n,adouble:CASE WHEN n>200 THEN n*0.01 END}) as mapcollection UNWIND mapcollection AS map CREATE (n:mynode) SET n = map RETURN n
If you then run this in Rapidminer:
Match (n:mynode) RETURN n.NodeNo,n.adouble ORDER BY n.NodeNo DESC
Then you get what you expect, however if you run this:
Match (n:mynode) RETURN n.NodeNo,n.adouble ORDER BY n.NodeNo ASC
Then you only get null values for n.adouble
Great, thank you. Let me try.
This works correctly in version 6, see
try (var result = stmt.executeQuery("MATCH (n:mynode) RETURN n.aninteger ORDER BY n.order_prop ASC")) {
assertThat(result.next()).isTrue();
assertThat(result.getInt(1)).isEqualTo(42);
assertThat(result.next()).isTrue();
assertThat(result.getInt(1)).isZero();
assertThat(result.wasNull()).isTrue();
assertThat(result.next()).isFalse();
}
I cannot test in RapidMiner, as the latest version of the product only works with a quite dated JDK 11, therefor I keep this issue open.