ml-models icon indicating copy to clipboard operation
ml-models copied to clipboard

Could not initialize class org.nd4j.linalg.factory.Nd4j

Open paltusplintus opened this issue 6 years ago • 8 comments

When copying the 3rd release .jar file to plugins folder and setting the the config: dbms.security.procedures.whitelist=regression., embedding. dbms.security.procedures.unrestricted=regression., embedding. (running on neo4j 3.5)

The procedures embedding.deepWalk and embedding.deepgl cannot be run in the neo4j browser due to the error: neo4j Could not initialize class org.nd4j.linalg.factory.Nd4j

Seems like the compiled file lacks some dependcies.

Sorry I cannot build myself as I am not Java programmer.

Anybody has same issues? Thx.

paltusplintus avatar Dec 22 '18 20:12 paltusplintus

@paltusplintus I am having the same issue for both deepgl and deepWalk on Neo4j 3.5.6 with version 1.0.3 of ml-models via the precompiled JAR. The exact error message is:

Failed to invoke procedure `embedding.deepgl`: 
Caused by: java.lang.NoClassDefFoundError: 
Could not initialize class org.nd4j.linalg.factory.Nd4j

apoc.*, algo.*, embedding.* are all properly whitelisted, installed, and accessible.

@mneedham @jexp @meltzerpete Do we need to recompile from source? Maybe whitelisting regression.* is required, even though those are visible via dbms.procedures()?

jameswweis avatar Jul 04 '19 17:07 jameswweis

@paltusplintus @jameswweis could you give a bit more information about your setup? i.e. operating system, cpu architecture and is there any additional information in the neo4j log? Sometimes I find that I need the neo4j log open while the error occurs to see the error messages (they aren't saved in the log file, just printed to stderr on the output in the console in red).

meltzerpete avatar Jul 08 '19 07:07 meltzerpete

@meltzerpete Definitely, thanks for the help. Kindly see below:

$ cat /etc/os-release
NAME="Ubuntu"
VERSION="16.04.5 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.5 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial
$ lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                40
On-line CPU(s) list:   0-39
Thread(s) per core:    2
Core(s) per socket:    10
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 85
Model name:            Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz
Stepping:              4
CPU MHz:               800.000
CPU max MHz:           2201.0000
CPU min MHz:           800.0000
BogoMIPS:              4401.47
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              1024K
L3 cache:              14080K
NUMA node0 CPU(s):     0-9,20-29
NUMA node1 CPU(s):     10-19,30-39

In debug.log, the only consequent line is 2019-07-08 15:26:05.040+0000 INFO [o.n.k.i.p.Procedures] Executing DeepWalk with params: {walkLength=10, windowSize=2, numberOfWalks=10, vectorSize=10, learningRate=0.01}

After which Neo4j fails with Failed to invoke procedure `embedding.dl4j.deepWalk`: Caused by: java.lang.NoClassDefFoundError: Could not initialize class org.nd4j.linalg.factory.Nd4j.

The query I am running (although it happens for all queries that I've tried, and for both deepWalk and deepgl) is effectively:

CALL embedding.dl4j.deepWalk('
MATCH (q:Node)-[:FROM]->(z:Year)
WHERE z.value <= 1900
RETURN id(q) as id
','
MATCH (q1:Node)-[:RELATED]->(q2: Node)
RETURN id(q1) AS source, id(q2) AS target
',{graph:'cypher', write:true, writeProperty:"temporary"});

I'm running Neo4j from the latest (3.5) Docker instance. The relevant section of my Docker configuration (checkpoints extensively delayed due to read/write issues on this node) is as follows:

        environment:
            - NEO4J_dbms_memory_heap_initial__size=31g
            - NEO4J_dbms_memory_heap_max__size=31g
            - NEO4J_dbms_memory_pagecache_size=600g
            - NEO4J_dbms_tx__log_rotation_retention__policy=false
            - NEO4J_dbms_tx__log_rotation_size=1M
            - NEO4J_unsupported_dbms_tx__log_fail__on__corrupted__log__files=false
            - NEO4J_dbms_checkpoint_iops_limit=-1
            - NEO4J_dbms_checkpoint_interval_time=42h
            - NEO4J_dbms_checkpoint_interval_tx=1000000000
            - NEO4J_dbms_config_strict__validation=false
            - NEO4J_dbms_security_procedures_unrestricted=apoc.*,embedding.*,algo.*
            - NEO5J_dbms_security_procedures_whitelist=apoc.*,embedding.*,algo.*
            - NEO4J_apoc_export_file_enabled=true
            - NEO4J_apoc_import_file_enabled=true
            - NEO4J_dbms_shell_enabled=true

Let me know if you need anything else.

jameswweis avatar Jul 08 '19 15:07 jameswweis

@jameswweis no problem. It definitely looks like a dependency issue. I can't reproduce this error on my machine, but here's some things you could try:

  • the log likely to show more information will be the main neo4j log - (see image). Anything printed to stderr (where the exceptions/stack trace will go) will not be written to the log file so you need this view open in advance - opening it after will not show the errors - if there's any further info there in red just paste it onto here
  • do you have any other JARs in the plugins folder? sometimes conflicting library versions can cause problems and it can fail to find the right classes
  • failing this, you could try to build from source and remove the nd4j-native-platform exclusions, ie. edit the pom.xml file and remove any <exclusion>...</exclusion> sections with nd4j-native-platform, and any <classifier>...</classifier> sections (including removal of those tags) also in nd4j-native-platform, and then remove any duplicate sections (i.e. where before it will have differed for linux/mac). The idea would be to include the complete libraries rather than filtering the desired parts. You should also be able to cut the nd4j-native sections completely leaving only the full nd4j-native-platform.. sorry if this doesn't work it's off the top of my head I haven't had time to try it out. You can then package it with maven using mvn clean package or mvn clean package -DskipTests (if some test causes the build to fail) and copy the JAR from the target folder into you plugins folder.

log

meltzerpete avatar Jul 08 '19 16:07 meltzerpete

Thanks, @meltzerpete. Regarding your questions:

(1) I didn't see anything in the neo4j.log file within the Docker container. I will check again after restarting our database and update if that changes.

(2) Yes, I have APOC, graph-algorithms, and mk-models JARs:

$ ls plugins
apoc-3.5.0.3-all.jar  graph-algorithms-algo-3.5.4.0.jar  neo4j-ml-models-1.0.3.jar

Would any of these cause conflicts, do you know?

(3) Thanks for the details. If none of the above helps, I'll try rebuilding from source as you recommend.

jameswweis avatar Jul 08 '19 16:07 jameswweis

@jameswweis ah right, sorry I missed the part above where you said you were running in docker I didn't read it properly. I think there is an issue with nd4j and docker, maybe there is some solution here https://gitter.im/deeplearning4j/deeplearning4j/archives/2018/05/07, otherwise can you try running without using docker?

meltzerpete avatar Jul 08 '19 19:07 meltzerpete

@paltusplintus @meltzerpete I'm having this issue as well.

I'm using ml-models-1.0.3 which I compiled myself without docker. I followed your instructions above to change the pom file so as to remove exclusions and classifier tags for the nd4j-native-platform. However, when I run CALL dbms.procedures() none of the embedding methods show up - only the regression methods are present.

The plugins I have installed are apoc-3.3.0.1.jar, graphQL-3.3.0.0.jar, graphAlgorithms-3.3.0.0.jar, neo4j-ml-models-1.0.2.jar

I have also set: dbms.security.procedures.unrestricted=algo.*,apoc.*,regression.*, embedding.* dbms.security.procedures.whitelist=algo.*,apoc.*,regression.*, embedding.*

Any other idea on how we might be able to fix this?

timholds avatar Oct 02 '19 05:10 timholds

@timholds I'm not sure the problem, but if the procs are not listed then it is likely a problem that occurs during database startup when it scans for them. I'm a little unclear on the problem you are facing.. Could you confirm that you also get the error Could not initialize class org.nd4j.linalg.factory.Nd4j? Is it that you got this error, then made the suggested change and now the procedures are not listed? Or am I misunderstanding?

Also, if you could post a copy of the logs/debug.log entries during database startup that might have some information about why the procedures aren't being registered.

meltzerpete avatar Oct 03 '19 16:10 meltzerpete