amazon-redshift-jdbc-driver icon indicating copy to clipboard operation
amazon-redshift-jdbc-driver copied to clipboard

Getting java.base/java.util.ArrayDeque.removeFirst(ArrayDeque.java:363) exception intermittently

Open shraddha-ca opened this issue 1 year ago • 9 comments

Driver version

2.1.0.26

Redshift version

PostgreSQL 8.0.2 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.4.2 20041017 (Red Hat 3.4.2-6.fc3), Redshift 1.0.68555

Client Operating System

linux/amd64

Table schema

Problem description

We use a Kafka Connect Docker Image deployed over ECS with a JDBC sink connector v10.6.0 and the Redshift JDBC driver to transfer data from MSK into a Redshift Serverless instance. This intermittently fails with the following stack trace:

Caused by: java.util.NoSuchElementException
	at java.base/java.util.ArrayDeque.removeFirst(ArrayDeque.java:363)
	at com.amazon.redshift.core.v3.QueryExecutorImpl.processResultsOnThread(QueryExecutorImpl.java:2368)
	at com.amazon.redshift.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1881)
	at com.amazon.redshift.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1873)
	at com.amazon.redshift.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:595)
	at com.amazon.redshift.jdbc.RedshiftStatementImpl.internalExecuteBatch(RedshiftStatementImpl.java:988)
	at com.amazon.redshift.jdbc.RedshiftStatementImpl.executeBatch(RedshiftStatementImpl.java:1016)
	at com.amazon.redshift.jdbc.RedshiftPreparedStatement.executeBatch(RedshiftPreparedStatement.java:1765)
	at io.confluent.connect.jdbc.sink.BufferedRecords.executeUpdates(BufferedRecords.java:196)
	at io.confluent.connect.jdbc.sink.BufferedRecords.flush(BufferedRecords.java:186)
	at io.confluent.connect.jdbc.sink.JdbcDbWriter.write(JdbcDbWriter.java:80)
	at io.confluent.connect.jdbc.sink.JdbcSinkTask.put(JdbcSinkTask.java:84)
	at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:587)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:336)
	at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:237)
	at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:206)
	at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:202)
	at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:257)
	at org.apache.kafka.connect.runtime.isolation.Plugins.lambda$withClassLoader$1(Plugins.java:181)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)

when an INSERT query has run after a long time. Redshift Serverless logs suggest that the query succeeds and then the connector fails. Usually restarting the task fixes the issue but since it doesn't throw a java.sql.SQLException, we can't automate the restarts.

Reproduction code

The following configurations are set on the connector, all other values are left as default:

connection.url: jdbc:redshift://host:port/dbname
connection.user: username
connection.password: password
table.name.format: tablename

The connection parameters are fetched here and passed on to the driver here

shraddha-ca avatar Nov 14 '23 04:11 shraddha-ca

Hi @shraddha-ca thanks for opening this issue. I will take a look and get back to you once I have an update

bhvkshah avatar Nov 21 '23 17:11 bhvkshah

Hi, @bhvkshah any update on the issue?

shawnazhaoxy avatar Jan 11 '24 22:01 shawnazhaoxy

Hi @shawnazhaoxy @shraddha-ca , thanks for following up. I believe this could be an issue in the server. Are you able to provide DEBUG level driver logs, as well as a wireshark trace during a repro which I could then pass on to the internal server team?

bhvkshah avatar Jan 12 '24 14:01 bhvkshah

@shraddha-ca can updates on providing more info on this?

bhvkshah avatar Feb 22 '24 20:02 bhvkshah

@bhvkshah apologies for the delay. We didn't have an easy way to get Debug Logs, however, I have attached the logs from the latest failure for:

Caused by: java.util.NoSuchElementException
	at java.base/java.util.ArrayDeque.removeFirst(ArrayDeque.java:363)
	at com.amazon.redshift.core.v3.QueryExecutorImpl.processResultsOnThread(QueryExecutorImpl.java:2368)
	at com.amazon.redshift.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1881)
	at com.amazon.redshift.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1873)
	at com.amazon.redshift.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:595)
	at com.amazon.redshift.jdbc.RedshiftStatementImpl.internalExecuteBatch(RedshiftStatementImpl.java:988)
	at com.amazon.redshift.jdbc.RedshiftStatementImpl.executeBatch(RedshiftStatementImpl.java:1016)
	at com.amazon.redshift.jdbc.RedshiftPreparedStatement.executeBatch(RedshiftPreparedStatement.java:1765)
	at io.confluent.connect.jdbc.sink.BufferedRecords.executeUpdates(BufferedRecords.java:196)
	at io.confluent.connect.jdbc.sink.BufferedRecords.flush(BufferedRecords.java:186)
	at io.confluent.connect.jdbc.sink.JdbcDbWriter.write(JdbcDbWriter.java:80)
	at io.confluent.connect.jdbc.sink.JdbcSinkTask.put(JdbcSinkTask.java:84)
	at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:587)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:336)
	at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:237)
	at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:206)
	at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:202)
	at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:257)
	at org.apache.kafka.connect.runtime.isolation.Plugins.lambda$withClassLoader$1(Plugins.java:181)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)

extract-2024-06-05T13_32_04.239Z.csv

shraddha-ca avatar Jun 05 '24 13:06 shraddha-ca

My suspicion is that the portal is already closed, mostly from looking at these lines:

Date Host Service Message
2024-06-05T12:03:13.711Z "kafka-ops" "cloudwatch" Jun 05 12:03:13.711 DEBUG [57817 task-thread] com.amazon.redshift.core.v3.QueryExecutorImpl.receiveRFQ: <=BE ReadyForQuery(I)
2024-06-05T12:03:13.711Z "kafka-ops" "cloudwatch" Jun 05 12:03:13.711 DEBUG [57817 task-thread] com.amazon.redshift.core.v3.QueryExecutorImpl.processSyncOnClose: <=BE CloseComplete
2024-06-05T12:03:13.711Z "kafka-ops" "cloudwatch" Jun 05 12:03:13.711 DEBUG [57817 task-thread] com.amazon.redshift.core.v3.QueryExecutorImpl.processSyncOnClose: <=BE CloseComplete
2024-06-05T12:03:13.709Z "kafka-ops" "cloudwatch" Jun 05 12:03:13.709 DEBUG [57817 task-thread] com.amazon.redshift.core.v3.QueryExecutorImpl.sendSync: FE=> Sync
2024-06-05T12:03:13.709Z "kafka-ops" "cloudwatch" Jun 05 12:03:13.709 DEBUG [57817 task-thread] com.amazon.redshift.core.v3.QueryExecutorImpl.sendFlush: FE=> Flush
2024-06-05T12:03:13.709Z "kafka-ops" "cloudwatch" Jun 05 12:03:13.709 DEBUG [57817 task-thread] com.amazon.redshift.core.v3.QueryExecutorImpl.sendClosePortal: FE=> ClosePortal(C_10-1848892317217922)
2024-06-05T12:03:13.709Z "kafka-ops" "cloudwatch" Jun 05 12:03:13.709 DEBUG [57817 task-thread] com.amazon.redshift.core.v3.QueryExecutorImpl.sendCloseStatement: FE=> CloseStatement(S_9-1848892317145757)
2024-06-05T12:03:13.709Z "kafka-ops" "cloudwatch" Jun 05 12:03:13.709 FUNCTION [57817 task-thread] com.amazon.redshift.jdbc.RedshiftStatementImpl.cancel: Return
2024-06-05T12:03:13.709Z "kafka-ops" "cloudwatch" Jun 05 12:03:13.709 ERROR [57817 task-thread] com.amazon.redshift.jdbc.RedshiftStatementImpl.cancel: statementState is StatementCancelState.IDLE
2024-06-05T12:03:13.709Z "kafka-ops" "cloudwatch" Jun 05 12:03:13.709 FUNCTION [57817 task-thread] com.amazon.redshift.jdbc.RedshiftStatementImpl.cancel: Enter (value)
2024-06-05T12:03:13.709Z "kafka-ops" "cloudwatch" Jun 05 12:03:13.709 FUNCTION [57817 task-thread] com.amazon.redshift.jdbc.RedshiftStatementImpl.close: Enter (value)
2024-06-05T12:03:13.709Z "kafka-ops" "cloudwatch" Jun 05 12:03:13.709 FUNCTION [57817 task-thread] com.amazon.redshift.jdbc.RedshiftPreparedStatement.executeUpdate: Return 0
2024-06-05T12:03:13.708Z "kafka-ops" "cloudwatch" Jun 05 12:03:13.708 DEBUG [57817 task-thread] com.amazon.redshift.core.v3.QueryExecutorImpl.processResultsOnThread: ReadyForQuery will exit from processResultsOnThread loop
2024-06-05T12:03:13.708Z "kafka-ops" "cloudwatch" Jun 05 12:03:13.708 DEBUG [57817 task-thread] com.amazon.redshift.core.v3.QueryExecutorImpl.receiveRFQ: <=BE ReadyForQuery(I)
2024-06-05T12:03:13.708Z "kafka-ops" "cloudwatch" Jun 05 12:03:13.708 DEBUG [57817 task-thread] com.amazon.redshift.core.v3.QueryExecutorImpl.processResultsOnThread: FE=> Received packet of type:90=Z

shraddha-ca avatar Jun 05 '24 13:06 shraddha-ca

@bhvkshah Is there any update on this? Even throwing a java.sql.SQLException will suffice for our use case. This will trigger an automatic restart for us.

shraddha-ca avatar Jun 21 '24 07:06 shraddha-ca

Hey @shraddha-ca , I no longer work on this project. Pinging @bsharifi who does.

bhvkshah avatar Jun 24 '24 14:06 bhvkshah

Hi there, we're hitting this issue in Metabase as well @bsharifi

paoliniluis avatar Aug 27 '24 17:08 paoliniluis