janusgraph-util icon indicating copy to clipboard operation
janusgraph-util copied to clipboard

Could this tool import csv to backend of cql and es ? [cassandra 3.11][janusgraph 0.3.0]

Open juncaofish opened this issue 7 years ago • 10 comments

It seems it hasn't been implemented in the following code..

public class ProxyBulkLoader implements BulkLoader {


    private BulkLoader real;

    public ProxyBulkLoader(StandardJanusGraph graph){

        String backend = graph.getConfiguration().getConfiguration().get(STORAGE_BACKEND);

        if ("cassandrathrift".equals(backend)){
            real = new CassandraSSTableLoader();
        }else if ("hbase".equals(backend)){
            // ignore
        }else if ("bigtable".equals(backend)){
            // ignore
        }else {
        }
    }

my config file looks like below:

cache.db-cache = true
cache.db-cache-clean-wait = 20
cache.db-cache-time = 180000
cache.db-cache-size = 0.5

storage.backend=cql
storage.cql.keyspace=graph1
gremlin.graph=org.janusgraph.core.JanusGraphFactory
storage.hostname=xx

index.base.index-name=graph1
index.base.backend=elasticsearch
index.base.hostname=xx:xx
index.base.elasticsearch.create.ext.number_of_shards=3
index.base.elasticsearch.create.ext.number_of_replicas=1
index.base.elasticsearch.create.ext.shard.check_on_startup=true
index.base.elasticsearch.create.ext.refresh_interval=10s

query.batch=true
ids.block-size=100000000
ids.renew-percentage=0.3
schema.default=none
storage.batch-loading=true

juncaofish avatar Sep 26 '18 09:09 juncaofish

yes, it can import csv to backend with cassandra , but i just set storage.backend=cassandrathrift instead of storage.backend= cql, my config just like :

storage.backend=cassandrathrift storage.hostname=localhost storage.cassandra.keyspace=janusgraph index.search.backend=elasticsearch index.search.hostname=localhost storage.buffer-size=10000 ids.block-size=10000000

dengziming avatar Oct 08 '18 03:10 dengziming

I set the config with:

storage.backend=cassandrathrift

The log for importing indicates data have been imported: [node]389[map]52[edge]272 .......... .......... .......... .......... .......... 5% .......... .......... .......... .......... .......... 10% .......... .......... .......... .......... .......... 15% .......... .......... .......... .......... .......... 20% .......... .......... .......... .......... .......... 25% .......... .......... .......... .......... .......... 30% .......... .......... .......... .......... .......... 35% .......... .......... .......... .......... .......... 40% .......... .......... .......... .......... .......... 45% .......... .......... .......... .......... .......... 50% .......... .......... .......... .......... .......... 55% .......... .......... .......... .......... .......... 60% .......... .......... .......... .......... .......... 65% .......... .......... .......... .......... .......... 70% .......... .......... .......... .......... .......... 75% .......... .......... .......... .......... .......... 80% .......... .......... .......... .......... .......... 85% .......... .......... .......... .......... .......... 90% .......... .......... .......... .......... .......... 95% .......... .......... .......... .......... .......... 100%

IMPORT DONE in 1s 193ms. Imported: 14 nodes 18 edges 31 properties Peak memory usage: 8000060

with exception:

InvalidRequestException(why:unconfigured table schema_columnfamilies)
java.lang.RuntimeException: Could not retrieve endpoint ranges: 
        at org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:344)
        at org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:156)
        at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:111)
        at janusgraph.util.batchimport.unsafe.load.cassandra.CassandraSSTableLoader.load(CassandraSSTableLoader.java:86)
        at janusgraph.util.batchimport.unsafe.load.ProxyBulkLoader.load(ProxyBulkLoader.java:39)
        at janusgraph.util.batchimport.unsafe.BulkLoad.main(BulkLoad.java:311)
Caused by: InvalidRequestException(why:unconfigured table schema_columnfamilies)
        at org.apache.cassandra.thrift.Cassandra$execute_cql3_query_result$execute_cql3_query_resultStandardScheme.read(Cassandra.java:50297)
        at org.apache.cassandra.thrift.Cassandra$execute_cql3_query_result$execute_cql3_query_resultStandardScheme.read(Cassandra.java:50274)
        at org.apache.cassandra.thrift.Cassandra$execute_cql3_query_result.read(Cassandra.java:50189)
        at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
        at org.apache.cassandra.thrift.Cassandra$Client.recv_execute_cql3_query(Cassandra.java:1734)
        at org.apache.cassandra.thrift.Cassandra$Client.execute_cql3_query(Cassandra.java:1719)
        at org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:323)

When I query the graph, it seems nothing has been imported yet. Could you help me figure out what's going on with the exception?

juncaofish avatar Oct 08 '18 07:10 juncaofish

@juncaofish i think your cassandra version is 3.X ? the bulk load use neither Thrift nor cqlsh, so the edition between cassandra and janusgraph should be consistent. for example ,if you are using cassandra 2.0 ,you should use janusgraph 2.X.

you can change the cassandra dependency version of your janusgraph to 3.X, or you can change your local cassandra to 2.X (i am using 2.1.8).

dengziming avatar Oct 09 '18 05:10 dengziming

we use cassandra 3.11 and janusgraph 0.3.0. Is it consistent?

juncaofish avatar Oct 09 '18 09:10 juncaofish

@juncaofish
I havn't try cassandra 3.11, I think it's OK . just ensure that the version of cassandra in janus are the same of cassandra you use . I will make some change and try to import to cassandra 3.0, just wait for some days, sorry!

dengziming avatar Oct 10 '18 05:10 dengziming

@dengziming Great. Thanks for your help.

juncaofish avatar Oct 10 '18 06:10 juncaofish

@juncaofish hello, i have tried janusgraph3.0 and cassandra 3.10, it's difficult to do bulk import, you can do it using code of the following branch: https://github.com/dengziming/janusgraph-util/tree/feature_cassandra3

but i am sorry that you should first run the java commend, and run the sstableloader commend manually .

for example, you run the BulkLoad in your IDE , the args are like this:

--into /Users/dengziming/opt/soft/neo4j-community-3.3.3/data/databases/all2018701.db \ --janus-config-file janusgraph.properties \ --skip-duplicate-nodes true \ --skip-bad-relationships true \ --ignore-extra-columns true \ --ignore-empty-strings true \ --bad-tolerance 10000000 \ --processors 1 \ --id-type string \ --max-memory 2G \ 
--nodes:titan /Users/dengziming/opt/data/tmp/v_titan.csv \ 
--nodes:location /Users/dengziming/opt/data/tmp/v_location.csv \ 
--nodes:god /Users/dengziming/opt/data/tmp/v_god.csv \ 
--nodes:demigod /Users/dengziming/opt/data/tmp/v_demigod.csv \ 
--nodes:human /Users/dengziming/opt/data/tmp/v_human.csv \ 
--nodes:monster /Users/dengziming/opt/data/tmp/v_monster.csv \ 
--edges:father /Users/dengziming/opt/data/tmp/e_god_titan_father.csv \ 
--edges:father /Users/dengziming/opt/data/tmp/e_demigod_god_father.csv \ 
--edges:mother /Users/dengziming/opt/data/tmp/e_demigod_human_mother.csv \ 
--edges:lives /Users/dengziming/opt/data/tmp/e_god_location_lives.csv \ 
--edges:lives /Users/dengziming/opt/data/tmp/e_monster_location_lives.csv \ 
--edges:brother /Users/dengziming/opt/data/tmp/e_god_god_brother.csv \ 
--edges:battled /Users/dengziming/opt/data/tmp/e_demigod_monster_battled.csv \ 
--edges:pet /Users/dengziming/opt/data/tmp/e_god_monster_pet.csv 

and after finish you will see some log:

IMPORT DONE in 1s 11ms. 
Imported:
  12 nodes
  17 edges
  27 properties
Peak memory usage: 8000052
you are using cassandra 3.0, please use `sstableloader` to manually load into cassandra
sstableloader -d localhost /Users/dengziming/opt/soft/neo4j-community-3.3.3/data/databases/all2018701.db/Nodes/0/janusgraph/edgestore
sstableloader -d localhost /Users/dengziming/opt/soft/neo4j-community-3.3.3/data/databases/all2018701.db/Edges/0/janusgraph/edgestore

and you can run all these commend to load data into janusgraph.

# # here is the commend
dengziming@dengzimings-MacBook-Pro:~/Desktop/worknotes$ sstableloader -d localhost /Users/dengziming/opt/soft/neo4j-community-3.3.3/data/databases/all2018701.db/Nodes/0/janusgraph/edgestore
objc[68147]: Class JavaLaunchHelper is implemented in both /Library/Java/JavaVirtualMachines/jdk1.8.0_121.jdk/Contents/Home//bin/java (0x1039524c0) and /Library/Java/JavaVirtualMachines/jdk1.8.0_121.jdk/Contents/Home/jre/lib/libinstrument.dylib (0x1051bc4e0). One of the two will be used. Which one is undefined.
WARN  16:02:14,081 Only 26.308GiB free across all data volumes. Consider adding more capacity to your cluster or removing obsolete snapshots
Established connection to initial hosts
Opening sstables and calculating sections to stream
Streaming relevant part of /Users/dengziming/opt/soft/neo4j-community-3.3.3/data/databases/all2018701.db/Nodes/0/janusgraph/edgestore/mc-1-big-Data.db to [localhost/127.0.0.1]
progress: [localhost/127.0.0.1]0:1/1 100% total: 100% 0.137KiB/s (avg: 0.137KiB/s)
progress: [localhost/127.0.0.1]0:1/1 100% total: 100% 0.000KiB/s (avg: 0.136KiB/s)

Summary statistics:
   Connections per host    : 1
   Total files transferred : 1
   Total bytes transferred : 0.581KiB
   Total duration          : 4255 ms
   Average transfer rate   : 0.136KiB/s
   Peak transfer rate      : 0.137KiB/s

WARN  16:02:18,949 JNA link failure, one or more native method will be unavailable.

## here is the commend
dengziming@dengzimings-MacBook-Pro:~/Desktop/worknotes$ sstableloader -d localhost /Users/dengziming/opt/soft/neo4j-community-3.3.3/data/databases/all2018701.db/Edges/0/janusgraph/edgestore
objc[68154]: Class JavaLaunchHelper is implemented in both /Library/Java/JavaVirtualMachines/jdk1.8.0_121.jdk/Contents/Home//bin/java (0x10a70f4c0) and /Library/Java/JavaVirtualMachines/jdk1.8.0_121.jdk/Contents/Home/jre/lib/libinstrument.dylib (0x10a7d74e0). One of the two will be used. Which one is undefined.
WARN  16:02:21,344 Only 26.316GiB free across all data volumes. Consider adding more capacity to your cluster or removing obsolete snapshots
Established connection to initial hosts
Opening sstables and calculating sections to stream
Streaming relevant part of /Users/dengziming/opt/soft/neo4j-community-3.3.3/data/databases/all2018701.db/Edges/0/janusgraph/edgestore/mc-2-big-Data.db to [localhost/127.0.0.1]
progress: [localhost/127.0.0.1]0:1/1 100% total: 100% 0.170KiB/s (avg: 0.170KiB/s)
progress: [localhost/127.0.0.1]0:1/1 100% total: 100% 0.000KiB/s (avg: 0.169KiB/s)

Summary statistics:
   Connections per host    : 1
   Total files transferred : 1
   Total bytes transferred : 0.605KiB
   Total duration          : 3575 ms
   Average transfer rate   : 0.169KiB/s
   Peak transfer rate      : 0.170KiB/s

WARN  16:02:25,488 JNA link failure, one or more native method will be unavailable.

then I query janusgrpah:

gremlin> g.V().count()
==>12
gremlin> saturn = g.V().has("name", "saturn").next();
==>v[4096]
gremlin> g.V(saturn).in("father").in("father").values("name")
==>hercules

so it's OK , good luck for you!

dengziming avatar Oct 11 '18 08:10 dengziming

hello, the latest code is now using TxImportStoreImpl to load data, can this tool add support for cql based on this? currently I am getting this exception: janusgraph-import/janusgraph.properties Input error: Could not find implementation class: org.janusgraph.diskstorage.cql.CQLStoreManager Caused by:Could not find implementation class: org.janusgraph.diskstorage.cql.CQLStoreManager java.lang.IllegalArgumentException: Could not find implementation class: org.janusgraph.diskstorage.cql.CQLStoreManager at org.janusgraph.util.system.ConfigurationUtil.instantiate(ConfigurationUtil.java:60) at org.janusgraph.diskstorage.Backend.getImplementationClass(Backend.java:476) at org.janusgraph.diskstorage.Backend.getStorageManager(Backend.java:408) at org.janusgraph.graphdb.configuration.GraphDatabaseConfiguration.(GraphDatabaseConfiguration.java:1254) at org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:160) at org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:131) at org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:78) at janusgraph.util.batchimport.unsafe.graph.GraphUtil.getGraph(GraphUtil.java:41) at janusgraph.util.batchimport.unsafe.BulkLoad.getGraph(BulkLoad.java:417) at janusgraph.util.batchimport.unsafe.BulkLoad.main(BulkLoad.java:284) Caused by: java.lang.ClassNotFoundException: org.janusgraph.diskstorage.cql.CQLStoreManager at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:264) at org.janusgraph.util.system.ConfigurationUtil.instantiate(ConfigurationUtil.java:56) ... 9 more

cljxhouse avatar Feb 17 '19 12:02 cljxhouse

@cljxhouse if you use TxImportStoreImpl to load data to cassandra, you can use cql to persist data.

if you get these error of Could not find implementation class, you can solve it by add the dependency janusgraph-cql to pom.xml.

dengziming avatar Feb 18 '19 06:02 dengziming

使用 sstableloader 的时候,提示错误

[root@localhost janusgraph-import]# sstableloader -d 127.0.0.1 --verbose data/all20190709_01.db/Nodes/0/janusgraph/edgestore Keyspace system_schema does not exist com.datastax.driver.core.exceptions.InvalidQueryException: Keyspace system_schema does not exist

cassandra 版本是3.11.4 janusgraph-util 的版本是 janusgraph-util-feature_cassandra3

不知道是版本或配置问题,还是 sstableloader不再支持cassandra。

fanweber avatar Jul 10 '19 02:07 fanweber