incubator-hugegraph-toolchain
incubator-hugegraph-toolchain copied to clipboard
[Feature][Umbrella] Support spark for hugegraph-loader module
Feature Description (功能描述)
Support spark for hugegraph-loader module
Test cmd
The options are divided into two parts, spark options and hugegraph-loader options. (no sorting required)
./bin/hugegraph-spark-loader.sh \
--master yarn --deploy-mode client --name spark-test \
--conf spark.eventLog.enabled=false \
--conf spark.executor.extraJavaOptions=-XX:+PrintGCDetails\
-f ./conf/spark.json --username admin --token admin \
-h 127.0.0.1 -p 8093 -g my_graph2
Task list
- [x] #281 @simon824
- [x] #305 @simon824
- [x] #311 @simon824
- [x] #317 @simon824
- [x] https://github.com/apache/incubator-hugegraph-doc/pull/143 @simon824
- [ ] support metrics statistics for spark-loader
- [ ] support create schema for spark-loader
- [ ] date_format option support for spark-loader
- [ ] other options support for spark-loader
- [ ] bugfix
Environment (环境信息)
- hugeServer Version: 0.12.0
- spark-loader version : branch master v1.0.0
- spark version : spark-3.1.2-bin-hadoop2.7
test cmd
./bin/hugegraph-spark-loader.sh \
--master local --deploy-mode client --name spark-test \
--conf spark.eventLog.enabled=false \
--conf spark.executor.extraJavaOptions=-XX:+PrintGCDetails \
-f ./example/file/struct.json \
--username admin \
--token admin \
-h 127.0.0.1 -p 8081 -g hugegraph
error log
java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkNotNull(Ljava/lang/Object;Ljava/lang/String;Ljava/lang/Object;)Ljava/lang/Object;
at com.baidu.hugegraph.util.E.checkNotNull(E.java:31)
at com.baidu.hugegraph.api.API.<init>(API.java:37)
at com.baidu.hugegraph.api.version.VersionAPI.<init>(VersionAPI.java:31)
at com.baidu.hugegraph.driver.VersionManager.<init>(VersionManager.java:31)
at com.baidu.hugegraph.driver.HugeClient.initManagers(HugeClient.java:94)
at com.baidu.hugegraph.driver.HugeClient.<init>(HugeClient.java:67)
at com.baidu.hugegraph.driver.HugeClientBuilder.build(HugeClientBuilder.java:67)
at com.baidu.hugegraph.loader.util.HugeClientHolder.create(HugeClientHolder.java:76)
at com.baidu.hugegraph.loader.executor.LoadContext.<init>(LoadContext.java:73)
at com.baidu.hugegraph.loader.spark.HugeGraphSparkLoader.initPartition(HugeGraphSparkLoader.java:105)
at com.baidu.hugegraph.loader.spark.HugeGraphSparkLoader.lambda$load$ed5b02be$1(HugeGraphSparkLoader.java:92)
at org.apache.spark.sql.Dataset.$anonfun$foreachPartition$2(Dataset.scala:2917)
at org.apache.spark.sql.Dataset.$anonfun$foreachPartition$2$adapted(Dataset.scala:2917)
at org.apache.spark.rdd.RDD.$anonfun$foreachPartition$2(RDD.scala:1020)
at org.apache.spark.rdd.RDD.$anonfun$foreachPartition$2$adapted(RDD.scala:1020)
at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2236)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
@JackyYangPassion Guava package conflict, please check whether there are different versions of guava package under $HUGEGRAPH_HOME/lib and $SPARK_HOME/jars/
@JackyYangPassion Guava package conflict, please check whether there are different versions of guava package under
$HUGEGRAPH_HOME/liband$SPARK_HOME/jars/
cp $HUGEGRAPH_HOME/lib/guava* $SPARK_HOME/jars/