java.sql.SQLException: [Amazon][JDBC](11380) Null pointer exception.
I get this exception seemingly randomly, in different locations in my code. All of them are when I'm trying to read from Redshift using a query, into a Spark DataFrame.
The extra bizarre thing is half the time this exception gets thrown, it doesn't have the stack trace with it; it literally just says "java.sql.SQLException: [Amazon]JDBC Null pointer exception." and nothing below that (no, nothing wrong with logging setup etc.).
Here's one instance of a full stacktrace:
java.sql.SQLException: [Amazon][JDBC](11380) Null pointer exception.
com.amazon.redshift.client.PGMessagingContext.handleMessage(Unknown Source)
com.amazon.jdbc.communications.InboundMessagesPipeline.getNextMessageOfClass(Unknown Source)
com.amazon.redshift.client.PGMessagingContext.checkErrorResponse(Unknown Source)
com.amazon.redshift.client.PGClient.startSession(Unknown Source)
com.amazon.redshift.client.PGClient.<init>(Unknown Source)
com.amazon.redshift.core.PGJDBCConnection.connect(Unknown Source)
com.amazon.jdbc.common.BaseConnectionFactory.doConnect(Unknown Source)
com.amazon.jdbc.common.AbstractDriver.connect(Unknown Source)
org.apache.spark.sql.execution.datasources.jdbc.DriverWrapper.connect(DriverWrapper.scala:45)
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$1.apply(JdbcUtils.scala:61)
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$1.apply(JdbcUtils.scala:52)
org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:58)
org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.<init>(JDBCRelation.scala:113)
org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:47)
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:306)
org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:146)
com.zcking.core.Database$class.read(Database.scala:210)
com.zcking.core.Database$.read(Database.scala:1268)
com.zcking.flow.MyFlow$$anonfun$runFlow$1.apply(MyFlow.scala:84)
com.zcking.flow.MyFlow$$anonfun$runFlow$1.apply(MyFlow.scala:40)
org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:628)
org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:628)
org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:51)
org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51)
org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51)
org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:416)
org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:50)
org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50)
org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50)
scala.util.Try$.apply(Try.scala:192)
org.apache.spark.streaming.scheduler.Job.run(Job.scala:39)
org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:257)
org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:257)
org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:257)
scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:256)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
The frequency this occurs is about once a day. Any idea what's going on here?
@zcking Did you solve this?
No, still having this issue. Sadly it doesn't look like @databricks cares about this codebase anymore...
In any case I did trace the issue down to the AWS redshift JDBC driver actually, not the spark-redshift library. Although this library could probably produce a workaround.
@zcking did you solve this issue? I'm facing the same issue on Kafka RedShift Sink Connector which is using AWS Redshift JDBC Driver too. I think the root cause is from AWS Redshift JDBC Driver