janusgraph-util
janusgraph-util copied to clipboard
Could this tool import csv to backend of cql and es ? [cassandra 3.11][janusgraph 0.3.0]
It seems it hasn't been implemented in the following code..
public class ProxyBulkLoader implements BulkLoader {
private BulkLoader real;
public ProxyBulkLoader(StandardJanusGraph graph){
String backend = graph.getConfiguration().getConfiguration().get(STORAGE_BACKEND);
if ("cassandrathrift".equals(backend)){
real = new CassandraSSTableLoader();
}else if ("hbase".equals(backend)){
// ignore
}else if ("bigtable".equals(backend)){
// ignore
}else {
}
}
my config file looks like below:
cache.db-cache = true
cache.db-cache-clean-wait = 20
cache.db-cache-time = 180000
cache.db-cache-size = 0.5
storage.backend=cql
storage.cql.keyspace=graph1
gremlin.graph=org.janusgraph.core.JanusGraphFactory
storage.hostname=xx
index.base.index-name=graph1
index.base.backend=elasticsearch
index.base.hostname=xx:xx
index.base.elasticsearch.create.ext.number_of_shards=3
index.base.elasticsearch.create.ext.number_of_replicas=1
index.base.elasticsearch.create.ext.shard.check_on_startup=true
index.base.elasticsearch.create.ext.refresh_interval=10s
query.batch=true
ids.block-size=100000000
ids.renew-percentage=0.3
schema.default=none
storage.batch-loading=true
yes, it can import csv to backend with cassandra , but i just set storage.backend=cassandrathrift instead of storage.backend= cql, my config just like :
storage.backend=cassandrathrift storage.hostname=localhost storage.cassandra.keyspace=janusgraph index.search.backend=elasticsearch index.search.hostname=localhost storage.buffer-size=10000 ids.block-size=10000000
I set the config with:
storage.backend=cassandrathrift
The log for importing indicates data have been imported: [node]389[map]52[edge]272 .......... .......... .......... .......... .......... 5% .......... .......... .......... .......... .......... 10% .......... .......... .......... .......... .......... 15% .......... .......... .......... .......... .......... 20% .......... .......... .......... .......... .......... 25% .......... .......... .......... .......... .......... 30% .......... .......... .......... .......... .......... 35% .......... .......... .......... .......... .......... 40% .......... .......... .......... .......... .......... 45% .......... .......... .......... .......... .......... 50% .......... .......... .......... .......... .......... 55% .......... .......... .......... .......... .......... 60% .......... .......... .......... .......... .......... 65% .......... .......... .......... .......... .......... 70% .......... .......... .......... .......... .......... 75% .......... .......... .......... .......... .......... 80% .......... .......... .......... .......... .......... 85% .......... .......... .......... .......... .......... 90% .......... .......... .......... .......... .......... 95% .......... .......... .......... .......... .......... 100%
IMPORT DONE in 1s 193ms. Imported: 14 nodes 18 edges 31 properties Peak memory usage: 8000060
with exception:
InvalidRequestException(why:unconfigured table schema_columnfamilies)
java.lang.RuntimeException: Could not retrieve endpoint ranges:
at org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:344)
at org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:156)
at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:111)
at janusgraph.util.batchimport.unsafe.load.cassandra.CassandraSSTableLoader.load(CassandraSSTableLoader.java:86)
at janusgraph.util.batchimport.unsafe.load.ProxyBulkLoader.load(ProxyBulkLoader.java:39)
at janusgraph.util.batchimport.unsafe.BulkLoad.main(BulkLoad.java:311)
Caused by: InvalidRequestException(why:unconfigured table schema_columnfamilies)
at org.apache.cassandra.thrift.Cassandra$execute_cql3_query_result$execute_cql3_query_resultStandardScheme.read(Cassandra.java:50297)
at org.apache.cassandra.thrift.Cassandra$execute_cql3_query_result$execute_cql3_query_resultStandardScheme.read(Cassandra.java:50274)
at org.apache.cassandra.thrift.Cassandra$execute_cql3_query_result.read(Cassandra.java:50189)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
at org.apache.cassandra.thrift.Cassandra$Client.recv_execute_cql3_query(Cassandra.java:1734)
at org.apache.cassandra.thrift.Cassandra$Client.execute_cql3_query(Cassandra.java:1719)
at org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:323)
When I query the graph, it seems nothing has been imported yet. Could you help me figure out what's going on with the exception?
@juncaofish i think your cassandra version is 3.X ? the bulk load use neither Thrift nor cqlsh, so the edition between cassandra and janusgraph should be consistent. for example ,if you are using cassandra 2.0 ,you should use janusgraph 2.X.
you can change the cassandra dependency version of your janusgraph to 3.X, or you can change your local cassandra to 2.X (i am using 2.1.8).
we use cassandra 3.11 and janusgraph 0.3.0. Is it consistent?
@juncaofish
I havn't try cassandra 3.11, I think it's OK . just ensure that the version of cassandra in janus are the same of cassandra you use .
I will make some change and try to import to cassandra 3.0, just wait for some days, sorry!
@dengziming Great. Thanks for your help.
@juncaofish hello, i have tried janusgraph3.0 and cassandra 3.10, it's difficult to do bulk import, you can do it using code of the following branch: https://github.com/dengziming/janusgraph-util/tree/feature_cassandra3
but i am sorry that you should first run the java commend, and run the sstableloader commend manually .
for example, you run the BulkLoad in your IDE , the args are like this:
--into /Users/dengziming/opt/soft/neo4j-community-3.3.3/data/databases/all2018701.db \ --janus-config-file janusgraph.properties \ --skip-duplicate-nodes true \ --skip-bad-relationships true \ --ignore-extra-columns true \ --ignore-empty-strings true \ --bad-tolerance 10000000 \ --processors 1 \ --id-type string \ --max-memory 2G \
--nodes:titan /Users/dengziming/opt/data/tmp/v_titan.csv \
--nodes:location /Users/dengziming/opt/data/tmp/v_location.csv \
--nodes:god /Users/dengziming/opt/data/tmp/v_god.csv \
--nodes:demigod /Users/dengziming/opt/data/tmp/v_demigod.csv \
--nodes:human /Users/dengziming/opt/data/tmp/v_human.csv \
--nodes:monster /Users/dengziming/opt/data/tmp/v_monster.csv \
--edges:father /Users/dengziming/opt/data/tmp/e_god_titan_father.csv \
--edges:father /Users/dengziming/opt/data/tmp/e_demigod_god_father.csv \
--edges:mother /Users/dengziming/opt/data/tmp/e_demigod_human_mother.csv \
--edges:lives /Users/dengziming/opt/data/tmp/e_god_location_lives.csv \
--edges:lives /Users/dengziming/opt/data/tmp/e_monster_location_lives.csv \
--edges:brother /Users/dengziming/opt/data/tmp/e_god_god_brother.csv \
--edges:battled /Users/dengziming/opt/data/tmp/e_demigod_monster_battled.csv \
--edges:pet /Users/dengziming/opt/data/tmp/e_god_monster_pet.csv
and after finish you will see some log:
IMPORT DONE in 1s 11ms.
Imported:
12 nodes
17 edges
27 properties
Peak memory usage: 8000052
you are using cassandra 3.0, please use `sstableloader` to manually load into cassandra
sstableloader -d localhost /Users/dengziming/opt/soft/neo4j-community-3.3.3/data/databases/all2018701.db/Nodes/0/janusgraph/edgestore
sstableloader -d localhost /Users/dengziming/opt/soft/neo4j-community-3.3.3/data/databases/all2018701.db/Edges/0/janusgraph/edgestore
and you can run all these commend to load data into janusgraph.
# # here is the commend
dengziming@dengzimings-MacBook-Pro:~/Desktop/worknotes$ sstableloader -d localhost /Users/dengziming/opt/soft/neo4j-community-3.3.3/data/databases/all2018701.db/Nodes/0/janusgraph/edgestore
objc[68147]: Class JavaLaunchHelper is implemented in both /Library/Java/JavaVirtualMachines/jdk1.8.0_121.jdk/Contents/Home//bin/java (0x1039524c0) and /Library/Java/JavaVirtualMachines/jdk1.8.0_121.jdk/Contents/Home/jre/lib/libinstrument.dylib (0x1051bc4e0). One of the two will be used. Which one is undefined.
WARN 16:02:14,081 Only 26.308GiB free across all data volumes. Consider adding more capacity to your cluster or removing obsolete snapshots
Established connection to initial hosts
Opening sstables and calculating sections to stream
Streaming relevant part of /Users/dengziming/opt/soft/neo4j-community-3.3.3/data/databases/all2018701.db/Nodes/0/janusgraph/edgestore/mc-1-big-Data.db to [localhost/127.0.0.1]
progress: [localhost/127.0.0.1]0:1/1 100% total: 100% 0.137KiB/s (avg: 0.137KiB/s)
progress: [localhost/127.0.0.1]0:1/1 100% total: 100% 0.000KiB/s (avg: 0.136KiB/s)
Summary statistics:
Connections per host : 1
Total files transferred : 1
Total bytes transferred : 0.581KiB
Total duration : 4255 ms
Average transfer rate : 0.136KiB/s
Peak transfer rate : 0.137KiB/s
WARN 16:02:18,949 JNA link failure, one or more native method will be unavailable.
## here is the commend
dengziming@dengzimings-MacBook-Pro:~/Desktop/worknotes$ sstableloader -d localhost /Users/dengziming/opt/soft/neo4j-community-3.3.3/data/databases/all2018701.db/Edges/0/janusgraph/edgestore
objc[68154]: Class JavaLaunchHelper is implemented in both /Library/Java/JavaVirtualMachines/jdk1.8.0_121.jdk/Contents/Home//bin/java (0x10a70f4c0) and /Library/Java/JavaVirtualMachines/jdk1.8.0_121.jdk/Contents/Home/jre/lib/libinstrument.dylib (0x10a7d74e0). One of the two will be used. Which one is undefined.
WARN 16:02:21,344 Only 26.316GiB free across all data volumes. Consider adding more capacity to your cluster or removing obsolete snapshots
Established connection to initial hosts
Opening sstables and calculating sections to stream
Streaming relevant part of /Users/dengziming/opt/soft/neo4j-community-3.3.3/data/databases/all2018701.db/Edges/0/janusgraph/edgestore/mc-2-big-Data.db to [localhost/127.0.0.1]
progress: [localhost/127.0.0.1]0:1/1 100% total: 100% 0.170KiB/s (avg: 0.170KiB/s)
progress: [localhost/127.0.0.1]0:1/1 100% total: 100% 0.000KiB/s (avg: 0.169KiB/s)
Summary statistics:
Connections per host : 1
Total files transferred : 1
Total bytes transferred : 0.605KiB
Total duration : 3575 ms
Average transfer rate : 0.169KiB/s
Peak transfer rate : 0.170KiB/s
WARN 16:02:25,488 JNA link failure, one or more native method will be unavailable.
then I query janusgrpah:
gremlin> g.V().count()
==>12
gremlin> saturn = g.V().has("name", "saturn").next();
==>v[4096]
gremlin> g.V(saturn).in("father").in("father").values("name")
==>hercules
so it's OK , good luck for you!
hello, the latest code is now using TxImportStoreImpl to load data, can this tool add support for cql based on this?
currently I am getting this exception:
janusgraph-import/janusgraph.properties
Input error: Could not find implementation class: org.janusgraph.diskstorage.cql.CQLStoreManager
Caused by:Could not find implementation class: org.janusgraph.diskstorage.cql.CQLStoreManager
java.lang.IllegalArgumentException: Could not find implementation class: org.janusgraph.diskstorage.cql.CQLStoreManager
at org.janusgraph.util.system.ConfigurationUtil.instantiate(ConfigurationUtil.java:60)
at org.janusgraph.diskstorage.Backend.getImplementationClass(Backend.java:476)
at org.janusgraph.diskstorage.Backend.getStorageManager(Backend.java:408)
at org.janusgraph.graphdb.configuration.GraphDatabaseConfiguration.
@cljxhouse if you use TxImportStoreImpl to load data to cassandra, you can use cql to persist data.
if you get these error of Could not find implementation class, you can solve it by add the dependency janusgraph-cql to pom.xml.
使用 sstableloader 的时候,提示错误
[root@localhost janusgraph-import]# sstableloader -d 127.0.0.1 --verbose data/all20190709_01.db/Nodes/0/janusgraph/edgestore Keyspace system_schema does not exist com.datastax.driver.core.exceptions.InvalidQueryException: Keyspace system_schema does not exist
cassandra 版本是3.11.4 janusgraph-util 的版本是 janusgraph-util-feature_cassandra3
不知道是版本或配置问题,还是 sstableloader不再支持cassandra。