jgit-spark-connector icon indicating copy to clipboard operation
jgit-spark-connector copied to clipboard

Engine fails to extract UASTs on actual Spark cluster

Open bzz opened this issue 7 years ago • 5 comments

When running on local mode with --packages "tech.sourced:engine:0.6.3" - extracting UASTs works.

But after switching to actual Apache Spark cluster with the same params and query i.e in Standalone mode - extractUAST fails with

java.lang.NoSuchMethodError: com.google.common.base.Stopwatch.createUnstarted()Lcom/google/common/base/Stopwatch;

Steps to Reproduce

  1. Start Apache Spark in a cluster mode and a spark-shell \w Engine
    export MASTER_HOST=127.0.0.1
    $SPARK_HOME/sbin/start-master.sh -h $MASTER_HOST -p 7077
    $SPARK_HOME/sbin/start-slave.sh $MASTER_HOST:7077
    $SPARK_HOME/bin/spark-shell --master "spark://$MASTER_HOST:7077" --packages 
    "tech.sourced:engine:0.6.3"
    
  2. Run extractUASTs
    import tech.sourced.engine._
    val path = "<path-to-siva-files>"
    val engine = Engine(spark, path, "siva")
    
    val repos = engine.getRepositories
    val files = repos.getHEAD
         .getCommits
         .getTreeEntries
         .getBlobs
    val uast = files.extractUASTs
    
    uast.count
    

Expected Behavior

get the number of UASTs

Current Behavior

java.lang.NoSuchMethodError

18/06/18 10:39:00 ERROR TaskSetManager: Task 0 in stage 0.0 failed 4 times; aborting job
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, 192.168.1.37, executor 0): java.lang.NoSuchMethodError: com.google.common.base.Stopwatch.createUnstarted()Lcom/google/common/base/Stopwatch;
	at io.grpc.internal.GrpcUtil$4.get(GrpcUtil.java:566)
	at io.grpc.internal.GrpcUtil$4.get(GrpcUtil.java:563)
	at io.grpc.internal.CensusStatsModule$ClientCallTracer.<init>(CensusStatsModule.java:333)
	at io.grpc.internal.CensusStatsModule.newClientCallTracer(CensusStatsModule.java:137)
	at io.grpc.internal.CensusStatsModule$StatsClientInterceptor.interceptCall(CensusStatsModule.java:672)
	at io.grpc.ClientInterceptors$InterceptorChannel.newCall(ClientInterceptors.java:104)
	at io.grpc.internal.ManagedChannelImpl.newCall(ManagedChannelImpl.java:636)
	at gopkg.in.bblfsh.sdk.v1.protocol.generated.ProtocolServiceGrpc$ProtocolServiceBlockingStub.parse(ProtocolServiceGrpc.scala:61)
	at org.bblfsh.client.BblfshClient.parse(BblfshClient.scala:30)
	at tech.sourced.engine.util.Bblfsh$.extractUAST(Bblfsh.scala:80)
	at tech.sourced.engine.udf.ExtractUASTsUDF$class.extractUASTs(ExtractUASTsUDF.scala:17)
	at tech.sourced.engine.udf.ExtractUASTsUDF$.extractUASTs(ExtractUASTsUDF.scala:24)
	at tech.sourced.engine.package$EngineDataFrame$$anon$2.call(package.scala:395)
	at tech.sourced.engine.package$EngineDataFrame$$anon$2.call(package.scala:377)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.agg_doAggregateWithoutKey$(Unknown Source)

Context

This mimics the file-duplication workflow we have in Gemini on hash. Ability to reproduce it in spark-shell is crucial for debugging.

Possible Solution

  • update the build, so final fatJar avoids calling un-shaded version of Guava
  • Add TravisCI profile that runs this query on actual local Apache Spark standalone cluster

Your Environment (for bugs)

  • Spark version: 2.2.0
  • Engine version: 0.6.3
  • Operating System and version: tested on Linux and macOS

bzz avatar Jun 18 '18 09:06 bzz

Update: this seems to be related with how --packages in Apache Spark work 😕

If spark-shell is started with:

  • --packages "tech.sourced:engine:0.6.3" -> java.lang.NoSuchMethodError
  • --jars <path-to-engine>/target/engine-0.6.3.jar -> works as expected

bzz avatar Jun 18 '18 09:06 bzz

--packages used to work for me. This seems to be the usual dependency hell with conflicting Guava versions.

smola avatar Jun 18 '18 10:06 smola

@bzz maybe can be related with some kind of cache used by the --packages command?

Maybe deleting .ivy2 folder solves the problem.

ajnavarro avatar Jun 18 '18 10:06 ajnavarro

True. @smola and --packages used to work for me as well.

@ajnavarro Hmmm.. But it did not work neither on my local machine nor on from a new pod on staging pipeline cluster. Or do you mean some Spark Master-side cache?

A quick verification on a new pod \w empty cache and local standalone cluster:

kubectl run -i --tty spark-new --image=srcd/spark:2.2.0_v2 --generator="run-pod/v1" --command -- /bin/bash

whoami
ls -la /root

export SPARK_HOME="/opt/spark-2.2.0-bin-hadoop2.7"
export MASTER_HOST=127.0.0.1
$SPARK_HOME/sbin/start-master.sh -h $MASTER_HOST -p 7077
$SPARK_HOME/sbin/start-slave.sh $MASTER_HOST:7077
$SPARK_HOME/bin/spark-shell --master "spark://$MASTER_HOST:7077" --packages "tech.sourced:engine:0.6.4"

and then

import tech.sourced.engine._
val path = "hdfs://hdfs-namenode/pga/siva/latest/ff/"
val engine = Engine(spark, path, "siva")

val repos = engine.getRepositories
val files = repos.getHEAD
     .getCommits
     .getTreeEntries
     .getBlobs
val uast = files.extractUASTs

uast.count

results in

18/06/18 12:07:43 WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID 2, 10.2.15.90, executor 0): java.lang.NoSuchMethodError: com.google.protobuf.Descriptors$Descriptor.getOneofs()Ljava/util/List;
	at com.google.protobuf.GeneratedMessageV3$FieldAccessorTable.<init>(GeneratedMessageV3.java:1727)
	at com.google.protobuf.DurationProto.<clinit>(DurationProto.java:52)
	at com.google.protobuf.duration.DurationProto$.javaDescriptor$lzycompute(DurationProto.scala:26)
	at com.google.protobuf.duration.DurationProto$.javaDescriptor(DurationProto.scala:25)
	at gopkg.in.bblfsh.sdk.v1.protocol.generated.GeneratedProto$.javaDescriptor$lzycompute(GeneratedProto.scala:63)
	at gopkg.in.bblfsh.sdk.v1.protocol.generated.GeneratedProto$.javaDescriptor(GeneratedProto.scala:59)
	at gopkg.in.bblfsh.sdk.v1.protocol.generated.ProtocolServiceGrpc$.<init>(ProtocolServiceGrpc.scala:30)
	at gopkg.in.bblfsh.sdk.v1.protocol.generated.ProtocolServiceGrpc$.<clinit>(ProtocolServiceGrpc.scala)
	at org.bblfsh.client.BblfshClient.<init>(BblfshClient.scala:20)

bzz avatar Jun 18 '18 11:06 bzz

@bzz it's a cache on master (or workers) side. I used to have the same problem. Removing cache helped. Reference: https://github.com/src-d/engine/issues/389

smacker avatar Jun 19 '18 08:06 smacker