GTAS icon indicating copy to clipboard operation
GTAS copied to clipboard

Neo4j, ETL Configuration, and Job Scheduler installation instructions

Open dje04001 opened this issue 4 years ago • 4 comments

How to install the Neo4j Component for GTAS Link Analysis

https://youtu.be/cDXOYdAVTHc

Configuring the ETL job

sudo mkdir -p /gtas-neo4j-etl/{config,job,log,job/temp} sudo chown -R gtas-admin /gtas-neo4j-etl sudo chmod -R 755 /gtas-neo4j-etl cp /opt/GTAS/gtas-neo4j-etl/job/*.ktr /gtas-neo4j-etl/job cp /opt/GTAS/gtas-neo4j-etl/job/*.kjb /gtas-neo4j-etl/job cp -r /opt/GTAS/gtas-neo4j-etl/config/. /gtas-neo4j-etl/config sudo chown -R gtas-admin /gtas-neo4j-etl/ sudo chmod -R 755 /gtas-neo4j-etl/

Edit gtas-neo4j-config.properties

vim /gtas-neo4j-etl/config/gtas-neo4j-config.properties Update the config to reflect the values below

...
EXT_VAR_GTAS_DB_USER_NAME=root
EXT_VAR_GTAS_DB_PASSWORD=admin
...
EXT_VAR_NEO4J_DB_USER_NAME=neo4j
EXT_VAR_NEO4J_DB_PASSWORD=admin
...

Install and configure Neo4j

sudo chmod -R u+rw /opt cd /opt sudo wget http://dist.neo4j.org/neo4j-community-3.5.3-unix.tar.gz sudo tar -xzf neo4j-community-3.5.3-unix.tar.gz sudo chown -R gtas-admin /opt/neo4j-community-3.5.3 sudo chmod -R 755 /opt/neo4j-community-3.5.3

Edit the config file

vim /opt/neo4j-community-3.5.3/conf/neo4j.conf

Update the config to reflect the values below

line 9 dbms.active_database=gtas.db
line 26 dbms.security.auth_enabled=true
line 62 dbms.connectors.default_advertised_address=localhost
line 71 dbms.connector.bolt.listen_address=:7687
line 75 dbms.connector.http.listen_address=:7474

Install Pentaho ETL tool

sudo mkdir -p /opt/pentaho sudo chown -R gtas-admin /opt/pentaho sudo chmod -R 755 /opt/pentaho

cd /opt/pentaho wget https://s3.amazonaws.com/kettle-neo4j/kettle-neo4j-remix-8.2.0.3-519-REMIX.zip unzip kettle-neo4j-remix-8.2.0.3-519-REMIX.zip -d /opt/pentaho

cp /opt/GTAS/gtas-neo4j-etl/drivers/mariadb-java-client-2.2.1.jar /opt/pentaho/data-integration/lib sudo chown -R gtas-admin /opt/pentaho sudo chmod -R 755 /opt/pentaho

cp -r /opt/GTAS/gtas-neo4j-etl/pdi-conf/. ~/ chown -R gtas-admin ~/.pentaho chmod -R 755 ~/.pentaho

Install the ETL Job Scheduler

cd /opt/GTAS/gtas-neo4j-scheduler sudo chown -R gtas-admin /opt/GTAS sudo chmod -R 755 /opt/GTAS mvn clean install -Dskip.unit.tests=true cp ./target/gtas-neo4j-job-scheduler-1.jar /gtas-neo4j-etl chmod 755 /gtas-neo4j-etl/gtas-neo4j-job-scheduler-1.jar

Add required views to the database

log into mariadb use gtas; source /opt/GTAS/gtas-neo4j-etl/sql/neo4j_hit_vw.sql source /opt/GTAS/gtas-neo4j-etl/sql/neo4j_vw.sql quit

Start Neo4j and ETL Job Scheduler

/opt/neo4j-community-3.5.3/bin/neo4j start

Update the password to 'admin' in Neo4j UI in web browser

cd /gtas-neo4j-etl java -jar gtas-neo4j-job-scheduler-1.jar

dje04001 avatar Jul 31 '20 16:07 dje04001

I installed everything successfully but I'm getting, "There are no labels in Database" in Node Labels & "There are no properties in Database" in Property Keys. My Database is running well. Please Help me.

pradyuman98 avatar Aug 08 '20 12:08 pradyuman98

Hi @pradyuman98 . Which branch / release are you using? If you can post a server log for the ETL job, that would be helpful as well. Please double-check that the neo4j_vw table was imported properly into your MariaDB. Thank you--

dje04001 avatar Aug 10 '20 14:08 dje04001

Hi @pradyuman98 . Which branch / release are you using? If you can post a server log for the ETL job, that would be helpful as well. Please double-check that the neo4j_vw table was imported properly into your MariaDB. Thank you--

Neo4j_vw talble is successfully made but data in it is not popping ?? It's showing "empty set".

Logs: [root@localhost ~]# /opt/neo4j-community-3.5.3/bin/neo4j start

Active database: gtas.db Directories in use: home: /opt/neo4j-community-3.5.3 config: /opt/neo4j-community-3.5.3/conf logs: /opt/neo4j-community-3.5.3/logs plugins: /opt/neo4j-community-3.5.3/plugins import: /opt/neo4j-community-3.5.3/import data: /opt/neo4j-community-3.5.3/data certificates: /opt/neo4j-community-3.5.3/certificates run: /opt/neo4j-community-3.5.3/run Starting Neo4j. WARNING: Max 1024 open files allowed, minimum of 40000 recommended. See the Neo4j manual. Started neo4j (pid 3523). It is available at http://localhost:7474/ There may be a short delay until the server is ready. See /opt/neo4j-community-3.5.3/logs/neo4j.log for current status. [root@localhost ~]# [root@localhost ~]# [root@localhost ~]# cd /gtas-neo4j-etl [root@localhost gtas-neo4j-etl]# java -jar gtas-neo4j-job-scheduler-1.jar

. ____ _ __ _ _ /\ / ' __ _ () __ __ _ \ \ \
( ( )_
_ | '_ | '| | ' / ` | \ \ \
\/ )| |)| | | | | || (| | ) ) ) ) ' |
| .__|| ||| |_, | / / / / =========||==============|/=//// :: Spring Boot :: (v2.0.5.RELEASE)

2020-08-10 23:06:36.368 INFO 3615 --- [ main] g.g.s.GtasNeo4jJobSchedulerApplication : Starting GtasNeo4jJobSchedulerApplication v1 on localhost.localdomain with PID 3615 (/gtas-neo4j-etl/gtas-neo4j-job-scheduler-1.jar started by root in /gtas-neo4j-etl) 2020-08-10 23:06:36.477 INFO 3615 --- [ main] g.g.s.GtasNeo4jJobSchedulerApplication : No active profile set, falling back to default profiles: default 2020-08-10 23:06:38.474 INFO 3615 --- [ main] s.c.a.AnnotationConfigApplicationContext : Refreshing org.springframework.context.annotation.AnnotationConfigApplicationContext@1996cd68: startup date [Mon Aug 10 23:06:38 EDT 2020]; root of context hierarchy 2020-08-10 23:06:43.826 INFO 3615 --- [ main] gov.gtas.scheduler.Neo4jScheduledTasks : --------SCHEDULER PROPERTIES FROM PROPERTIES FILE ----- 2020-08-10 23:06:43.827 INFO 3615 --- [ main] gov.gtas.scheduler.Neo4jScheduledTasks : - execInterval: 60 2020-08-10 23:06:43.827 INFO 3615 --- [ main] gov.gtas.scheduler.Neo4jScheduledTasks : - opSystem: linux 2020-08-10 23:06:43.827 INFO 3615 --- [ main] gov.gtas.scheduler.Neo4jScheduledTasks : - pdiDir: /opt/pentaho/data-integration/./kitchen.sh 2020-08-10 23:06:43.828 INFO 3615 --- [ main] gov.gtas.scheduler.Neo4jScheduledTasks : - jobDir: /gtas-neo4j-etl/job/gtas-to-neo-job.kjb 2020-08-10 23:06:43.828 INFO 3615 --- [ main] gov.gtas.scheduler.Neo4jScheduledTasks : - logLevel: Minimal 2020-08-10 23:06:43.828 INFO 3615 --- [ main] gov.gtas.scheduler.Neo4jScheduledTasks : - logDir: /gtas-neo4j-etl/log/gtas-neo4j 2020-08-10 23:06:43.828 INFO 3615 --- [ main] gov.gtas.scheduler.Neo4jScheduledTasks : - configFilePropertyName: EXT_ETL_CONFIG_FILE 2020-08-10 23:06:43.829 INFO 3615 --- [ main] gov.gtas.scheduler.Neo4jScheduledTasks : - configFile: /gtas-neo4j-etl/config/gtas-neo4j-config.properties 2020-08-10 23:06:43.829 INFO 3615 --- [ main] gov.gtas.scheduler.Neo4jScheduledTasks : ---------------------------------- 2020-08-10 23:06:45.301 INFO 3615 --- [ main] o.s.j.e.a.AnnotationMBeanExporter : Registering beans for JMX exposure on startup 2020-08-10 23:06:45.498 INFO 3615 --- [ main] s.a.ScheduledAnnotationBeanPostProcessor : No TaskScheduler/ScheduledExecutorService bean found for scheduled processing 2020-08-10 23:06:45.818 INFO 3615 --- [ main] g.g.s.GtasNeo4jJobSchedulerApplication : Started GtasNeo4jJobSchedulerApplication in 14.517 seconds (JVM running for 19.498) 2020-08-10 23:06:45.823 INFO 3615 --- [ main] g.g.s.GtasNeo4jJobSchedulerApplication : THE GTAS-NEO4J JOB SCHEDULER IS STARTING...... 2020-08-10 23:06:45.657 INFO 3615 --- [pool-2-thread-1] gov.gtas.scheduler.Neo4jScheduledTasks : Starting the thread to execute the PDI job .... 2020-08-10 23:06:45.831 INFO 3615 --- [pool-2-thread-1] gov.gtas.scheduler.thread.RunnableTask : COMMAND LINE: /opt/pentaho/data-integration/./kitchen.sh -file='/gtas-neo4j-etl/job/gtas-to-neo-job.kjb' -param:EXT_ETL_CONFIG_FILE='/gtas-neo4j-etl/config/gtas-neo4j-config.properties' -level=Minimal >> /gtas-neo4j-etl/log/gtas-neo4j_20200810.log 2020-08-10 23:06:45.831 INFO 3615 --- [pool-2-thread-1] gov.gtas.scheduler.thread.RunnableTask : *** LAUNCHING PDI ETL JOB FROM SCHEDULER **** 2020-08-10 23:08:19.606 INFO 3615 --- [pool-2-thread-1] gov.gtas.scheduler.thread.RunnableTask : *** END OF ETL JOB FROM SCHEDULER .....EXIT VALUE = 0 2020-08-10 23:09:19.610 INFO 3615 --- [pool-2-thread-1] gov.gtas.scheduler.Neo4jScheduledTasks : Starting the thread to execute the PDI job .... 2020-08-10 23:09:19.611 INFO 3615 --- [pool-2-thread-1] gov.gtas.scheduler.thread.RunnableTask : COMMAND LINE: /opt/pentaho/data-integration/./kitchen.sh -file='/gtas-neo4j-etl/job/gtas-to-neo-job.kjb' -param:EXT_ETL_CONFIG_FILE='/gtas-neo4j-etl/config/gtas-neo4j-config.properties' -level=Minimal >> /gtas-neo4j-etl/log/gtas-neo4j_20200810.log 2020-08-10 23:09:19.612 INFO 3615 --- [pool-2-thread-1] gov.gtas.scheduler.thread.RunnableTask : *** LAUNCHING PDI ETL JOB FROM SCHEDULER **** 2020-08-10 23:10:20.348 INFO 3615 --- [pool-2-thread-1] gov.gtas.scheduler.thread.RunnableTask : *** END OF ETL JOB FROM SCHEDULER .....EXIT VALUE = 0 ^Z [1]+ Stopped java -jar gtas-neo4j-job-scheduler-1.jar

pradyuman98 avatar Aug 10 '20 14:08 pradyuman98

Hi @pradyuman98 . Which branch / release are you using? If you can post a server log for the ETL job, that would be helpful as well. Please double-check that the neo4j_vw table was imported properly into your MariaDB. Thank you--

Neo4j_vw talble is successfully made but data in it is not popping ?? It's showing "empty set".

Logs: [root@localhost ~]# /opt/neo4j-community-3.5.3/bin/neo4j start

Active database: gtas.db Directories in use: home: /opt/neo4j-community-3.5.3 config: /opt/neo4j-community-3.5.3/conf logs: /opt/neo4j-community-3.5.3/logs plugins: /opt/neo4j-community-3.5.3/plugins import: /opt/neo4j-community-3.5.3/import data: /opt/neo4j-community-3.5.3/data certificates: /opt/neo4j-community-3.5.3/certificates run: /opt/neo4j-community-3.5.3/run Starting Neo4j. WARNING: Max 1024 open files allowed, minimum of 40000 recommended. See the Neo4j manual. Started neo4j (pid 3523). It is available at http://localhost:7474/ There may be a short delay until the server is ready. See /opt/neo4j-community-3.5.3/logs/neo4j.log for current status. [root@localhost ~]# [root@localhost ~]# [root@localhost ~]# cd /gtas-neo4j-etl [root@localhost gtas-neo4j-etl]# java -jar gtas-neo4j-job-scheduler-1.jar

. ____ _ __ _ _ /\ / ' __ _ () __ __ _ \ \ \ ( ( )__ | '_ | '| | ' / ` | \ \ \ / )| |)| | | | | || (| | ) ) ) ) ' || .__|| ||| |, | / / / / =========||==============|/=///_/ :: Spring Boot :: (v2.0.5.RELEASE)

2020-08-10 23:06:36.368 INFO 3615 --- [ main] g.g.s.GtasNeo4jJobSchedulerApplication : Starting GtasNeo4jJobSchedulerApplication v1 on localhost.localdomain with PID 3615 (/gtas-neo4j-etl/gtas-neo4j-job-scheduler-1.jar started by root in /gtas-neo4j-etl) 2020-08-10 23:06:36.477 INFO 3615 --- [ main] g.g.s.GtasNeo4jJobSchedulerApplication : No active profile set, falling back to default profiles: default 2020-08-10 23:06:38.474 INFO 3615 --- [ main] s.c.a.AnnotationConfigApplicationContext : Refreshing org.springframework.context.annotation.AnnotationConfigApplicationContext@1996cd68: startup date [Mon Aug 10 23:06:38 EDT 2020]; root of context hierarchy 2020-08-10 23:06:43.826 INFO 3615 --- [ main] gov.gtas.scheduler.Neo4jScheduledTasks : --------SCHEDULER PROPERTIES FROM PROPERTIES FILE ----- 2020-08-10 23:06:43.827 INFO 3615 --- [ main] gov.gtas.scheduler.Neo4jScheduledTasks : - execInterval: 60 2020-08-10 23:06:43.827 INFO 3615 --- [ main] gov.gtas.scheduler.Neo4jScheduledTasks : - opSystem: linux 2020-08-10 23:06:43.827 INFO 3615 --- [ main] gov.gtas.scheduler.Neo4jScheduledTasks : - pdiDir: /opt/pentaho/data-integration/./kitchen.sh 2020-08-10 23:06:43.828 INFO 3615 --- [ main] gov.gtas.scheduler.Neo4jScheduledTasks : - jobDir: /gtas-neo4j-etl/job/gtas-to-neo-job.kjb 2020-08-10 23:06:43.828 INFO 3615 --- [ main] gov.gtas.scheduler.Neo4jScheduledTasks : - logLevel: Minimal 2020-08-10 23:06:43.828 INFO 3615 --- [ main] gov.gtas.scheduler.Neo4jScheduledTasks : - logDir: /gtas-neo4j-etl/log/gtas-neo4j 2020-08-10 23:06:43.828 INFO 3615 --- [ main] gov.gtas.scheduler.Neo4jScheduledTasks : - configFilePropertyName: EXT_ETL_CONFIG_FILE 2020-08-10 23:06:43.829 INFO 3615 --- [ main] gov.gtas.scheduler.Neo4jScheduledTasks : - configFile: /gtas-neo4j-etl/config/gtas-neo4j-config.properties 2020-08-10 23:06:43.829 INFO 3615 --- [ main] gov.gtas.scheduler.Neo4jScheduledTasks : ---------------------------------- 2020-08-10 23:06:45.301 INFO 3615 --- [ main] o.s.j.e.a.AnnotationMBeanExporter : Registering beans for JMX exposure on startup 2020-08-10 23:06:45.498 INFO 3615 --- [ main] s.a.ScheduledAnnotationBeanPostProcessor : No TaskScheduler/ScheduledExecutorService bean found for scheduled processing 2020-08-10 23:06:45.818 INFO 3615 --- [ main] g.g.s.GtasNeo4jJobSchedulerApplication : Started GtasNeo4jJobSchedulerApplication in 14.517 seconds (JVM running for 19.498) 2020-08-10 23:06:45.823 INFO 3615 --- [ main] g.g.s.GtasNeo4jJobSchedulerApplication : THE GTAS-NEO4J JOB SCHEDULER IS STARTING...... 2020-08-10 23:06:45.657 INFO 3615 --- [pool-2-thread-1] gov.gtas.scheduler.Neo4jScheduledTasks : Starting the thread to execute the PDI job .... 2020-08-10 23:06:45.831 INFO 3615 --- [pool-2-thread-1] gov.gtas.scheduler.thread.RunnableTask : COMMAND LINE: /opt/pentaho/data-integration/./kitchen.sh -file='/gtas-neo4j-etl/job/gtas-to-neo-job.kjb' -param:EXT_ETL_CONFIG_FILE='/gtas-neo4j-etl/config/gtas-neo4j-config.properties' -level=Minimal >> /gtas-neo4j-etl/log/gtas-neo4j_20200810.log 2020-08-10 23:06:45.831 INFO 3615 --- [pool-2-thread-1] gov.gtas.scheduler.thread.RunnableTask : *** LAUNCHING PDI ETL JOB FROM SCHEDULER **** 2020-08-10 23:08:19.606 INFO 3615 --- [pool-2-thread-1] gov.gtas.scheduler.thread.RunnableTask : *** END OF ETL JOB FROM SCHEDULER .....EXIT VALUE = 0 2020-08-10 23:09:19.610 INFO 3615 --- [pool-2-thread-1] gov.gtas.scheduler.Neo4jScheduledTasks : Starting the thread to execute the PDI job .... 2020-08-10 23:09:19.611 INFO 3615 --- [pool-2-thread-1] gov.gtas.scheduler.thread.RunnableTask : COMMAND LINE: /opt/pentaho/data-integration/./kitchen.sh -file='/gtas-neo4j-etl/job/gtas-to-neo-job.kjb' -param:EXT_ETL_CONFIG_FILE='/gtas-neo4j-etl/config/gtas-neo4j-config.properties' -level=Minimal >> /gtas-neo4j-etl/log/gtas-neo4j_20200810.log 2020-08-10 23:09:19.612 INFO 3615 --- [pool-2-thread-1] gov.gtas.scheduler.thread.RunnableTask : *** LAUNCHING PDI ETL JOB FROM SCHEDULER **** 2020-08-10 23:10:20.348 INFO 3615 --- [pool-2-thread-1] gov.gtas.scheduler.thread.RunnableTask : *** END OF ETL JOB FROM SCHEDULER .....EXIT VALUE = 0 ^Z [1]+ Stopped java -jar gtas-neo4j-job-scheduler-1.jar

Please Help!!!!!!!

pradyuman98 avatar Aug 18 '20 05:08 pradyuman98