spark-redshift icon indicating copy to clipboard operation
spark-redshift copied to clipboard

Error while using spark-redshift jar

Open ghost opened this issue 8 years ago • 37 comments

Hi,

Getting the below error while using the jar to integrate redshift with spark locally.

Exception in thread "main" java.lang.AbstractMethodError: com.databricks.spark.redshift.RedshiftFileFormat.prepareRead(Lorg/apache/spark/sql/SparkSession;Lscala/collection/immutable/Map;Lscala/collection/Seq;)Lscala/collection/immutable/Map;

at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:330)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:160)
	at com.databricks.spark.redshift.RedshiftRelation.buildScan(RedshiftRelation.scala:168)
	at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$3.apply(DataSourceStrategy.scala:141)
	at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$3.apply(DataSourceStrategy.scala:141)
	at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:184)
	at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:183)
	at org.apache.spark.sql.execution.datasources.DataSourceStrategy$.pruneFilterProjectRaw(DataSourceStrategy.scala:257)
	at org.apache.spark.sql.execution.datasources.DataSourceStrategy$.pruneFilterProject(DataSourceStrategy.scala:179)
	at org.apache.spark.sql.execution.datasources.DataSourceStrategy$.apply(DataSourceStrategy.scala:137)
	at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:59)
	at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:59)
	at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
	at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:60)
	at org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:55)
	at org.apache.spark.sql.execution.SparkStrategies$SpecialLimits$.apply(SparkStrategies.scala:54)
	at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:59)
	at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:59)
	at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
	at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:60)
	at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:77)
	at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:75)
	at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:82)
	at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:82)
	at org.apache.spark.sql.Dataset.withTypedCallback(Dataset.scala:2462)
	at org.apache.spark.sql.Dataset.head(Dataset.scala:1861)
	at org.apache.spark.sql.Dataset.take(Dataset.scala:2078)
	at org.apache.spark.sql.Dataset.showString(Dataset.scala:240)
	at org.apache.spark.sql.Dataset.show(Dataset.scala:533)
	at org.apache.spark.sql.Dataset.show(Dataset.scala:493)
	at org.apache.spark.sql.Dataset.show(Dataset.scala:502)
	at simpleSample.RedshiftToSpark$.main(RedshiftToSpark.scala:53)
	at simpleSample.RedshiftToSpark.main(RedshiftToSpark.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)

I find that prepareRead method is not in the RedshiftFileFormat.

Thanks & Regards, Ravi

ghost avatar Dec 29 '16 07:12 ghost

Which version of Spark are you using? If you're using 2.1.x then I suspect that changes to internal APIs may have broke spark-redshift, in which case we'll need to make a new release.

JoshRosen avatar Dec 29 '16 18:12 JoshRosen

Actually, looking a little more closely since this problem relates to prepareRead I don't think it's a 2.1.x issue since that method had been completely removed from Spark by that point (see https://github.com/apache/spark/pull/13698). According to https://issues.apache.org/jira/browse/SPARK-15983 that change went into 2.0.

Thus: are you using a newer version of spark-redshift with Spark 1.x? You'll need to use a 1.x version of this library with Spark 1.x; newer versions won't work with Spark 1.x.

JoshRosen avatar Dec 29 '16 18:12 JoshRosen

I'm getting the same exception with a different stack trace and only when I switch from spark 2.0.1 to spark 2.1.0/hadoop 2.7/mesos/spark-redshift_2.11-2.0.1.jar/RedshiftJDBC41-1.1.17.1017.jar

48f7-81e8-02403dbc2b57-S107): java.lang.AbstractMethodError: org.apache.spark.sql.execution.datasources.OutputWriterFactory.getFileExtension(Lorg/apache/hadoop/mapreduce/TaskAttemptContext;)Ljava/lang/String;
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.<init>(FileFormatWriter.scala:232)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:182)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$3.apply(FileFormatWriter.scala:129)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$3.apply(FileFormatWriter.scala:128)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
	at org.apache.spark.scheduler.Task.run(Task.scala:99)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)

lminer avatar Jan 06 '17 19:01 lminer

I'm getting this error as well, with spark 2.1.0, I've also tried using the 3.0.0-preview1 of this library, previously was using 2.0.0.

java.lang.AbstractMethodError: org.apache.spark.sql.execution.datasources.OutputWriterFactory.getFileExtension(Lorg/apache/hadoop/mapreduce/TaskAttemptContext;)Ljava/lang/String;
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.<init>(FileFormatWriter.scala:232)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:182)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$3.apply(FileFormatWriter.scala:129)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$3.apply(FileFormatWriter.scala:128)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
	at org.apache.spark.scheduler.Task.run(Task.scala:99)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

Edit: Here's a bit bigger stack trace that may help.

17/01/09 22:45:34 ERROR FileFormatWriter: Aborting job null.
org.apache.spark.SparkException: Job aborted due to stage failure: Task 5 in stage 1.0 failed 1 times, most recent failure: Lost task 5.0 in stage 1.0 (TID 6, localhost, executor driver): java.lang.AbstractMethodError: org.apache.spark.sql.execution.datasources.OutputWriterFactory.getFileExtension(Lorg/apache/hadoop/mapreduce/TaskAttemptContext;)Ljava/lang/String;
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.<init>(FileFormatWriter.scala:232)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:182)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$3.apply(FileFormatWriter.scala:129)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$3.apply(FileFormatWriter.scala:128)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
	at org.apache.spark.scheduler.Task.run(Task.scala:99)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
	at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1435)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1423)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1422)
	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1422)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802)
	at scala.Option.foreach(Option.scala:257)
	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:802)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1650)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1605)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1594)
	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
	at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:628)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1918)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1931)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1951)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply$mcV$sp(FileFormatWriter.scala:127)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:121)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:121)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:121)
	at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:101)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
	at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:87)
	at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:87)
	at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:492)
	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:215)
	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:198)
	at com.databricks.spark.redshift.RedshiftWriter.unloadData(RedshiftWriter.scala:295)
	at com.databricks.spark.redshift.RedshiftWriter.saveToRedshift(RedshiftWriter.scala:392)
	at com.databricks.spark.redshift.DefaultSource.createRelation(DefaultSource.scala:108)
	at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:426)
	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:215)
	at org.lucidhq.SFRedshiftETL.SFObject.redshiftLoad(SFObject.scala:115)
	at org.lucidhq.SFRedshiftETL.SFObject.load(SFObject.scala:256)
	at org.lucidhq.SFRedshiftETL.SFRedshiftETL$$anonfun$run$1.apply(main.scala:61)
	at org.lucidhq.SFRedshiftETL.SFRedshiftETL$$anonfun$run$1.apply(main.scala:44)
	at scala.collection.immutable.List.foreach(List.scala:381)
	at org.lucidhq.SFRedshiftETL.SFRedshiftETL$.run(main.scala:44)
	at org.lucidhq.SFRedshiftETL.SFRedshiftETL$$anonfun$main$1.apply(main.scala:83)
	at org.lucidhq.SFRedshiftETL.SFRedshiftETL$$anonfun$main$1.apply(main.scala:83)
	at scala.Option.map(Option.scala:146)
	at org.lucidhq.SFRedshiftETL.SFRedshiftETL$.main(main.scala:83)
	at org.lucidhq.SFRedshiftETL.SFRedshiftETL.main(main.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738)
	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.AbstractMethodError: org.apache.spark.sql.execution.datasources.OutputWriterFactory.getFileExtension(Lorg/apache/hadoop/mapreduce/TaskAttemptContext;)Ljava/lang/String;
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.<init>(FileFormatWriter.scala:232)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:182)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$3.apply(FileFormatWriter.scala:129)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$3.apply(FileFormatWriter.scala:128)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
	at org.apache.spark.scheduler.Task.run(Task.scala:99)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

schwartzmx avatar Jan 10 '17 04:01 schwartzmx

@JoshRosen Any plans to make a new release soon? Seems like it's needed to use this with 2.1.0.

lminer avatar Jan 13 '17 22:01 lminer

@JoshRosen hit the same issue after upgrading from Spark 2.0.2 to Spark 2.1.0 our pipeline started throwing exceptions with the same cause

Caused by: java.lang.AbstractMethodError: org.apache.spark.sql.execution.datasources.OutputWriterFactory.getFileExtension(Lorg/apache/hadoop/mapreduce/TaskAttemptContext;)Ljava/lang/String;

We are using spark-redsfhit 2.0.1 with https://s3.amazonaws.com/redshift-downloads/drivers/RedshiftJDBC41-1.1.17.1017.jar

elyast avatar Jan 20 '17 00:01 elyast

@elyast hit the same issue using spark 2.1.0.

I make this question in Stackoverflow

Using the version 2.0.2 of Spark you have the same issue? I'm not able to make the spark-redshift work in 2.0.2, if possible a help will be useful.

carlos-eduardo-gb avatar Jan 20 '17 13:01 carlos-eduardo-gb

found the root cause, spark 2.1 added new method to the interface: org.apache.spark.sql.execution.datasources.OutputWriterFactory#def getFileExtension(context: TaskAttemptContext): String

which is not implemented in spark-avro, hence AbstractMethodError

elyast avatar Jan 20 '17 19:01 elyast

Ran into the same issue with spark 2.1.0 , is there a work around (besides bumping the spark version down?).

apurva-sharma avatar Jan 29 '17 19:01 apurva-sharma

@apurva-sharma you can build this patch: https://github.com/databricks/spark-avro/pull/206 and replace spark-avro dependency with this custom version, at least it worked for us

elyast avatar Jan 30 '17 02:01 elyast

@elyast thanks for that, I can verify that monkey patching spark-avro as above worked for me with spark 2.1.0 It will be great if this is merged.

apurva-sharma avatar Jan 30 '17 22:01 apurva-sharma

@apurva-sharma +1

elyast avatar Jan 30 '17 22:01 elyast

looks like spark-avro was fixed. any updates here?

alexander-branevskiy avatar Feb 08 '17 05:02 alexander-branevskiy

any updates when this issue will be fixed?

sanketvega avatar Feb 13 '17 07:02 sanketvega

^ @JoshRosen

diegorep avatar Feb 13 '17 19:02 diegorep

Atm this driver is completely unusable ...

caeleth avatar Feb 24 '17 18:02 caeleth

Fixed mine by adding this line to sbt project build.sbt:

dependencyOverrides += spark_avro_320 where val spark_avro_320: ModuleID = "com.databricks" % "spark-avro_2.11" % "3.2.0"

I am using spark-redshift 3 btw...

Hopefully this library can be actively supported in the long run, it looks like it has not been updated for several months....

hnfmr avatar Feb 25 '17 04:02 hnfmr

I've tried what @hnfmr suggests, but I am still running into this issue.

mrdmnd avatar Mar 09 '17 08:03 mrdmnd

@mrdmnd To be specific, I am using the Spark-Redshift v3.0.0-preview1 and my build.sbt looks like:

lazy val app = (project in file("app")).
  .settings(commonSettings: _*)
  .settings(
    libraryDependencies += "com.databricks" % "spark-redshift_2.11" % "3.0.0-preview1",
    dependencyOverrides += "com.databricks" % "spark-avro_2.11" % "3.2.0"
  )
)

BTW, I am using Spark 2.1.0... hope this helps

hnfmr avatar Mar 09 '17 09:03 hnfmr

@elyast Can you please describe what you did? My guess:

  1. Clone the spark-avro repo and checkout the commit of that PR (post-merge).
  2. Build the jar.
  3. Use SBT to use this jar. (Do you know how to do this offhand?)

Thank you!

wafisher avatar Mar 14 '17 22:03 wafisher

Also seeing this issue here. @hnfmr's fix is working for me now, but it would be nice to have this properly fixed. Spark is a popular tool and Redshift usage is only going to grow.

Exact workaround was to add the following to my build.sbt file:

// Temporary fix for: https://github.com/databricks/spark-redshift/issues/315
dependencyOverrides += "com.databricks" % "spark-avro_2.11" % "3.2.0"

sadowski avatar Mar 14 '17 23:03 sadowski

Yeah, I had a minor typo. Can confirm that this works.

mrdmnd avatar Mar 22 '17 07:03 mrdmnd

I use Zeppelin to do ETL to redshift and encountered the same AbstractMethodError.

By configuring the spark interpreter to exclude com.databricks:spark-avro_2.11:3.0.0 while depending on com.databricks:spark-redshift_2.11:2.0.1, and then to specify another dependency on com.databricks:spark-avro_2.11:3.2.0 works for me

Thanks a lot!

cockroachzl avatar Mar 23 '17 06:03 cockroachzl

Yes! Just update or replace spark-avro_2.11-3.1.0.jar with spark-avro_2.11-3.2.0.jar and this problem should be solved now.

https://mvnrepository.com/artifact/com.databricks/spark-avro_2.11/3.2.0

Aung-Myint-Thein avatar Apr 12 '17 10:04 Aung-Myint-Thein

HI, I have got the same problem. I am using spark 2.1.0 and tried using spark-redshift 3.0.0-preview1 and 2.0.1, 2.0.0. All of them gives the same error.

java.lang.AbstractMethodError: org.apache.spark.sql.execution.datasources.OutputWriterFactory.getFileExtension(Lorg/apache/hadoop/mapreduce/T$ skAttemptContext;)Ljava/lang/String; at org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.(FileFormatWriter.scala:232) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeT$ sk(FileFormatWriter.scala:182) at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$3.apply(FileFormatWriter.scala:129) at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$3.apply(FileFormatWriter.scala:128) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:99) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 17/04/24 21:12:15 ERROR TaskSetManager: Task 1 in stage 2.0 failed 1 times; aborting job 17/04/24 21:12:15 ERROR FileFormatWriter: Aborting job null. org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 2.0 failed 1 times, most recent failure: Lost task 1.0 in s tage 2.0 (TID 202, localhost, executor driver): java.lang.AbstractMethodError: org.apache.spark.sql.execution.datasources.OutputWriterFactory. getFileExtension(Lorg/apache/hadoop/mapreduce/TaskAttemptContext;)Ljava/lang/String; at org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.(FileFormatWriter.scala:232) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTa sk(FileFormatWriter.scala:182) at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$3.apply(FileFormatWriter.scala:129) at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$3.apply(FileFormatWriter.scala:128) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:99) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)

cshintov avatar Apr 24 '17 15:04 cshintov

I have the same problem and I am using code in spark branch 2.2. Spark avro was spark-avro_2.11-3.2.0.jar already.

Caused by: java.lang.AbstractMethodError: org.apache.spark.sql.execution.datasources.OutputWriter.write(Lorg/apache/spark/sql/catalyst/InternalRow;)V
  at org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.execute(FileFormatWriter.scala:318)
  at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:249)
  at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:247)
  at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1375)
  at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:252)

giaosudau avatar Jun 13 '17 08:06 giaosudau

Any updates on this one? It seems that the underlying dependency (spark-avro_2.11-3.2.0) has resolved this issue. Instead of having everyone depend on the workaround, could the owner release a version that depends on the 3.2.0 of spark-avro?

davidzhao avatar Jun 17 '17 06:06 davidzhao

It seems this issue and repo are getting stale, would love to have this updated. @JoshRosen would it be possible to open this up to new contributors?

schwartzmx avatar Jun 20 '17 19:06 schwartzmx

Any updates on this? I'm using this through pyspark and am unable to try the work arounds suggested.

tylermichael avatar Aug 18 '17 20:08 tylermichael

Looks like this issue is going to be fixed in next version of spark-avro lib - https://github.com/databricks/spark-avro/pull/242. It's merged to master 8 days ago

dnaumenko avatar Aug 25 '17 11:08 dnaumenko