com.databricks.spark.redshift.RedshiftFileFormat.supportDataType(Lorg/apache/spark/sql/types/DataType;Z)Z
I'm attempting to do a simple query from a redshift table and receiving the following error.
Exception in thread "main" java.lang.AbstractMethodError: com.databricks.spark.redshift.RedshiftFileFormat.supportDataType(Lorg/apache/spark/sql/types/DataType;Z)Z
at org.apache.spark.sql.execution.datasources.DataSourceUtils$$anonfun$verifySchema$1.apply(DataSourceUtils.scala:48)
at org.apache.spark.sql.execution.datasources.DataSourceUtils$$anonfun$verifySchema$1.apply(DataSourceUtils.scala:47)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at org.apache.spark.sql.types.StructType.foreach(StructType.scala:99)
at org.apache.spark.sql.execution.datasources.DataSourceUtils$.verifySchema(DataSourceUtils.scala:47)
at org.apache.spark.sql.execution.datasources.DataSourceUtils$.verifyReadSchema(DataSourceUtils.scala:39)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:400)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
at com.databricks.spark.redshift.RedshiftRelation.buildScan(RedshiftRelation.scala:168)
at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$10.apply(DataSourceStrategy.scala:293)
at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$10.apply(DataSourceStrategy.scala:293)
at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:326)
at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:325)
at org.apache.spark.sql.execution.datasources.DataSourceStrategy.pruneFilterProjectRaw(DataSourceStrategy.scala:403)
at org.apache.spark.sql.execution.datasources.DataSourceStrategy.pruneFilterProject(DataSourceStrategy.scala:321)
at org.apache.spark.sql.execution.datasources.DataSourceStrategy.apply(DataSourceStrategy.scala:289)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:63)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:63)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:78)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:75)
at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157)
at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1336)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:75)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:67)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93)
at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:72)
at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:68)
at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:77)
at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:77)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3360)
at org.apache.spark.sql.Dataset.head(Dataset.scala:2545)
at org.apache.spark.sql.Dataset.take(Dataset.scala:2759)
at org.apache.spark.sql.Dataset.getRows(Dataset.scala:255)
at org.apache.spark.sql.Dataset.showString(Dataset.scala:292)
at org.apache.spark.sql.Dataset.show(Dataset.scala:746)
at org.apache.spark.sql.Dataset.show(Dataset.scala:705)
at org.apache.spark.sql.Dataset.show(Dataset.scala:714)
at LogChecks$.main(LogChecks.scala:90)
at LogChecks.main(LogChecks.scala)
18/12/12 10:14:24 INFO SparkContext: Invoking stop() from shutdown hook
18/12/12 10:14:24 INFO SparkUI: Stopped Spark web UI at http://10.75.8.183:4040
18/12/12 10:14:24 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
18/12/12 10:14:24 INFO MemoryStore: MemoryStore cleared
18/12/12 10:14:24 INFO BlockManager: BlockManager stopped
18/12/12 10:14:24 INFO BlockManagerMaster: BlockManagerMaster stopped
18/12/12 10:14:24 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
18/12/12 10:14:24 INFO SparkContext: Successfully stopped SparkContext
18/12/12 10:14:24 INFO ShutdownHookManager: Shutdown hook called
18/12/12 10:14:24 INFO ShutdownHookManager: Deleting directory /private/var/folders/01/333yn1tn7rb93sspgwp_z_p804ccgt/T/spark-7c91c661-11e5-4b22-8d9f-2afe851dcdd6
CheckRedshift.scala
import org.apache.spark.sql.SparkSession
import org.apache.spark.{SparkConf, SparkContext}
object CheckRedshift {
def main(args: Array[String]): Unit = {
val awsAccessKeyId="asdfasdf"
val awsSecretAccessKey="asdlfjakfalksdf"
val redshiftDBName = "dbName"
val redshiftUserId = "*****"
val redshiftPassword = "*****"
val redshifturl = "blah-blah.us-west-1.redshift.amazonaws.com:5555"
val jdbcURL = s"jdbc:redshift://$redshifturl/$redshiftDBName?user=$redshiftUserId&password=$redshiftPassword"
println(jdbcURL)
val sc = new SparkContext(new SparkConf().setAppName("SparkSQL").setMaster("local"))
// Configure SparkContext to communicate with AWS
val tempS3Dir = "s3a://mys3bucket/temp"
sc.hadoopConfiguration.set("fs.s3a.access.key", awsAccessKeyId)
sc.hadoopConfiguration.set("fs.s3a.secret.key", awsSecretAccessKey)
// Create the SQL Context
val sqlContext = SparkSession.builder.config(sc.getConf).getOrCreate()
import sqlContext.implicits._
val myQuery =
"""SELECT * FROM public.redshift_table
WHERE date_time >= '2018-01-01'
AND date_time < '2018-01-02'"""
val df = sqlContext.read
.format("com.databricks.spark.redshift")
.option("url", jdbcURL)
.option("tempdir", tempS3Dir)
.option("forward_spark_s3_credentials", "true")
.option("query", myQuery)
.load()
df.show()
}
}
build.sbt
name := "log_check"
version := "0.1"
scalaVersion := "2.11.8"
// https://mvnrepository.com/artifact/org.apache.spark/spark-core
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.4.0"
// https://mvnrepository.com/artifact/org.apache.spark/spark-sql
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.4.0"
libraryDependencies += "org.json4s" %% "json4s-native" % "3.2.9"
//https://github.com/databricks/spark-redshift
resolvers += "jitpack" at "https://jitpack.io"
libraryDependencies += "com.github.databricks" %% "spark-redshift" % "master-SNAPSHOT"
libraryDependencies += "org.apache.hadoop" % "hadoop-aws" % "2.6.0"
libraryDependencies += "com.amazonaws" % "aws-java-sdk" % "1.11.407"
libraryDependencies += "org.apache.hadoop" % "hadoop-client" % "2.6.0"
libraryDependencies += "net.java.dev.jets3t" % "jets3t" % "0.9.4"
// https://mvnrepository.com/artifact/com.amazon/redshift-jdbc41
resolvers += "redshift" at "http://redshift-maven-repository.s3-website-us-east-1.amazonaws.com/release"
libraryDependencies += "com.amazon.redshift" % "redshift-jdbc4" % "1.2.10.1009"
// Temporary fix for: https://github.com/databricks/spark-redshift/issues/315
dependencyOverrides += "com.databricks" % "spark-avro_2.11" % "3.2.0"
Same problem here :/
+1
They kindly stopped support the package because they provide it only the data bricks customers. poor but if we want it we need to maintain it by ourselves.
This issue is related to Spark 2.4. A workaround is to use Spark 2.3.1. Unfortunately it seems nobody is working to add support for 2.4 see #418
It looks like this fork https://github.com/Yelp/spark-redshift added support to spark 2.4 but I didn't test it. To be checked
I tried building the Yelp fork... still getting this error. Appreciate any thoughts on next steps to try and get this working.
Same issue here
The Yelp fork is not compatible with 2.4 (yet), we're working on it.. But it seems that udemy's fork might be: https://github.com/udemy/spark-redshift
re-compile the lib with the latest spark version should resolve the problem. The latest one is having new abstract function call supportDataType in FileFormat.scala
So, we've started a community edition of spark-redshift which works with spark2.4. Feel free to try it out! If you do, it'd be very helpful to receive your feedback. Any pull requests are very very welcome too.
https://github.com/spark-redshift-community/spark-redshift
pinging @andybrnr, @Bennyelg, @julienbachmann who were having issues with 2.4