Prevent the execution of clearBatch() when executeBatch throws an exception.
When the executeBatch method is executed and the process throws an exception, the InputBasedPreparedStatement.executeAny method clears the content of the generated preparedStatement by invoking the clearBatch() method. This makes it impossible to create a retry system when the exception is generated by a recoverable issue.
In implementations where batches of more than one million elements are being processed, the most efficient way to store the data is directly in the driver's preparedStatement. This way, the data is saved serialized, saving a large amount of memory. The execution of the clearBatch method prevents this possibility and forces redundant data storage.
Adding the ability to include a property to disable automatic clearBatch when an exception occurs would make the driver offer more flexible solutions when performing huge numbers of insertions on the server.
@erinerum thank you for reporting the issue.
Btw, what is initial format from which data is getting to prepared statement? Is it possible to use plain java client with one of supported formats (like JSONEachRow) to stream data from input to server by passing JDBC complexity?
Thank you for the quick response. The use of the JDBC client has been proposed as an additional abstraction layer. However, exploring the use of the Java client could be very interesting. Thank you for the suggestion!
@erinerum btw, there is a Slack Community where you may ask questions about ClickHouse https://clickhouse.com/slack We are glad to help!
Any news on this issue? I encountered a similar problem in a third party connector that had a retry attempt developed.
@AlexanderBobryakov yes, it is in my priority list. Do you have a stack trace for the problem? Thanks!
@chernser Thank you for the quick response. Yes, I will provide all the information.
I am testing on clickhouse-jdbc:0.6.4 (but the problematic code remains in the latest version).
Testing code block (source - flink_clickhouse_connector):
for (int i = 0; i <= maxRetries; i++) {
try {
stmt.executeBatch(); // its com.clickhouse.jdbc.internal.ClickHouseStatementImpl
return;
} catch (Exception exception) {
if (i >= maxRetries) {
throw new SQLException(....;
}
// sleep...
}
}
When running the code above, everything is saved successfully.
Next, for testing, during operation, I delete a column from the table in ClickHouse:
ALTER TABLE mobile_hits_medium DROP COLUMN platform
And the code above catches the corresponding Exception created on the line in InputBasedPreparedStatement. Error text:
java.sql.SQLException: Code: 16. DB::Exception: No such column platform in table default.mobile_hits_medium (a259ceab-0a9c-4141-92ab-bd91e73739bf). (NO_SUCH_COLUMN_IN_TABLE) (version 24.4.4.113 (official build))
When writing to the table again (retry), inserting batch is lost (check code in InputBasedPreparedStatement), because the previous error always executes the code to clear the statement . The code block above does not throw any errors and there is no indication that anything went wrong. I tried to decorate InputBasedPreparedStatement but it doesn't look simple
@AlexanderBobryakov thank you for information!
I've looked into the source code of Flink connector - it doesn't check if exception is retriable. For your example, when schema was changed - it will be not a retriable error.
Besides, if retry logic is implemented - there should be some deduplication (we have a deduplication token for inserts).
Please be aware of things above.
And the last one if I implement property that will prevent call clearBatch then connector should do it itself.
I think the proper way would be to handle BatchUpdateException by connector.
Just to note:
- current driver behavior is correct because it cleanup resources after committing changes
- application should properly handle result of batch execution and fill another batch if it want's to retry
- if communication problem occurs - then underlying layer will retry
- batch API should be more covered with docs to avoid misuse.
@chernser Thank you for your time! I wanted to clarify a few points:
Flink connector - it doesn't check if exception is retriable. For your example, when schema was changed - it will be not a retriable error.
Are there any recommendations on what exception can be considered repeatable? In my understanding we should try to retry any exceptions and that is correct. Someone could have manually changed the schema and in the current state of affairs we will simply lose data because jdbc will not re-create the exception because it implicitly cleared the batch
- current driver behavior is correct because it cleanup resources after committing changes
But the driver also cleans up resources if no changes were made in case of any exception. This forces us to recreate the statement on any error to perform retry attempts.
- if communication problem occurs - then underlying layer will retry
Is it possible to show the place where this happens and its configuration (sorry, I'm new to ClickHouse)? I tried to conduct a test when I stop the ClickHouse container before recording a batch and the problem did not change: first, an error occurs at the InputBasedPreparedStatement#executeAny level, the batch is cleared, and when repeated at the connector level, InputBasedPreparedStatement#executeAny immediately returns ClickHouseValues.EMPTY_LONG_ARRAY. Example of exception:
java.sql.BatchUpdateException: Connect to http://localhost:8123 [localhost/127.0.0.1, localhost/0:0:0:0:0:0:0:1] failed: Connection refused
Usually only time resolving exceptions are retriable. For example, IOException. But JDBC doesn't declare how to pass such exceptions and each driver may wrap it differently. In this case it application have to handle it manually. We trying to handle network issues and retry if possible.
I think, it is not safe to retry on any exception - there may be side effects, for example duplications. And JDBC suggests using information from BatchUpdateException to build another batch for retry.
Driver assumes that application will create a new batch any way. JDBC unfortunately doesn't give application ability to remove statements from a batch forcing to recreate it. We can add an option to keep batch on failure, but I think it will not help much. Do you have list of reasons when batch failed? Is it because schema change or network issue or something else?
Here is code that handles it https://github.com/ClickHouse/clickhouse-java/blob/main/clickhouse-http-client/src/main/java/com/clickhouse/client/http/ApacheHttpConnectionImpl.java#L286 . To make it work you need to set number of retries with ClickHouseHttpOption.AHC_RETRY_ON_FAILURE. The new client-v2 makes it similar - here is according setting https://github.com/ClickHouse/clickhouse-java/blob/main/client-v2/src/main/java/com/clickhouse/client/api/Client.java#L771