Need assistance with migration to either latest jdbc or preferably client-v2
The documentation is a bit lacking so I'd appreciate assistance with migration from prepared statement old version jdbc to client v2 batch insert logic.
Clickhouse: on-prem installation version: clickhouse-server:23.8.9.54
We have a logic that inserts batch of records (50000) into table. Then we have materialised view that aggregates data into aggregated table.
Table schema (field names obfuscated)
CREATE TABLE IF NOT EXISTS cloud.test_null_local ON CLUSTER testcluster
(
field_1 LowCardinality(String),
field_2 UInt64,
field_3 LowCardinality(String),
field_4 String,
field_5 Nullable(String),
field_6 Nullable(String),
field_7 Nullable(UInt64),
field_8 Nullable(String),
field_9 String,
field_10 String,
field_11 Nested(
key String,
value String
),
field_12 Nullable(String),
field_13 Nullable(String),
field_14 Nested(
key String,
value String
),
field_15 Nested(
name String,
id UInt16
),
field_16 Nested(
name String,
field String,
value String
),
field_17 Nullable(Float32),
field_18 Nullable(String),
field_19 Array(String),
field_20 String,
field_21 Nullable(String),
field_22 String,
field_23 Nullable(String),
field_24 Nullable(String),
field_25 String,
field_26 UInt32,
field_27 Array(String),
field_28 Nullable(DateTime),
field_29 Nullable(DateTime),
field_30 DateTime,
field_31 DateTime,
field_32 DateTime default now()
)
ENGINE = Null;
Here is java code
public void save(List<TestRecord> entities) {
if (entities.size() > 0) {
log.info("Store {} in DB", entities.size());
var query = """
INSERT INTO test_null_local (field_1, field_2, //... other columns omitted )
VALUES(?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
""";
try (var connection = dataSource.getConnection();
var statement = connection.prepareStatement(query)) {
for (var entity : entities) {
statement.setString(1, entity.field1())
//Other fields setters
statement.addBatch();
}
statement.executeBatch();
} catch (SQLException e) {
log.error("SQL error insert ", e);
}
}
}
This logic works with jdbc version 0.5.0 + HikaridCP 5.0.1 and it takes ~90 seconds to insert 50_000 records.
Once I tried to simply update jdbc to version 0.9.0 the insert logic hangs (I've waited for 10 minutes and stopped the process). Plus it prints to log:
line 4:15 extraneous input '.' expecting {',', ')'}
line 4:55 extraneous input '.' expecting {',', ')'}
So that version didn't help. I've tried to migrate to client-v2 POJO insert like this. Client configuration:
client = new Client.Builder()
.addEndpoint(appConfig.dbUrl())
.setDefaultDatabase(appConfig.dbName())
.setUsername(appConfig.dbUser())
.setPassword(appConfig.dbPassword())
.setMaxConnections(10)
.compressClientRequest(true)
.useHttpCompression(true)
.setLZ4UncompressedBufferSize(10_058_576)
.setSocketRcvbuf(500_000)
.setSocketTcpNodelay(true)
.setSocketSndbuf(10_000_000)
.setClientNetworkBufferSize(10_000_000)
.allowBinaryReaderToReuseBuffers(true)
.build();
client.register(TestEntity.class, client.getTableSchema("test_null_local"));
Insert code:
public void save(List<TestEntity> entities) {
try (var response = client.insert(TABLE, entities).get()) {
} catch (Exception e) {
log.error("SQL error insert", e);
}
}
POJO code (getters and setters omitted):
import java.time.LocalDateTime;
import java.util.List;
public class TestEntity {
private String field_1;
private long field_2;
private String field_3;
private String field_4;
private String field_5;
private String field_6;
private Long field_7;
private String field_8;
private String field_9;
private String field_10;
private List<String> field_11_key;
private List<String> field_11_value;
private String field_12;
private String field_13;
private List<String> field_14_key;
private List<String> field_14_value;
private List<String> field_15_name;
private List<Integer> field_15_id;
private List<String> field_16_key;
private List<String> field_16_field;
private List<String> field_16_value;
private Float field_17;
private String field_18;
private List<String> field_19;
private String field_20;
private String field_21;
private String field_22;
private String field_23;
private String field_24;
private String field_25;
private long field_26;
private List<String> field_27;
private LocalDateTime field_28;
private LocalDateTime field_29;
private LocalDateTime field_30;
private LocalDateTime field_31;
private LocalDateTime field_32;
}
With this code I've got no http error 37 seconds after insert call.
c.c.c.a.Client:1250 Retrying. org.apache.hc.core5.http.NoHttpResponseException: [DBURL here] failed to respond
I've tried to reduce batch to 10_000 records but it didn't help. Several hundreds records could be inserted successfully but it kinda contradicts batching guidelines.
Good day, @DKyryk! Thank you for the feedback.
The JDBC v2 has an option to do insert using RowBinaryFormat if parameters are not wrapped with function calls. Please set driver property row_binary_for_simple_insert=true to enable this.
If there is no requirement to use JDBC we recommend to use java client directly because it will be more optimized way (less checks and type conversions compare to JDBC).
Default JDBC implementation will build a string statement and send it as is. But we are aware of the problem and will come up with a solution. Just JDBC has some API constraints.
@DKyryk regarding
line 4:15 extraneous input '.' expecting {',', ')'}
line 4:55 extraneous input '.' expecting {',', ')'}
It should be fixed in the 0.9.1 but I will re-test it. However I would need a complete statement may be.
@DKyryk regarding client-v2 usage
- configuration seems right.
- please try removing
.useHttpCompression(true)- this option uses not the best lz4 implementation and ClikHouse's LZ4 support works better. - please try removing
.setSocketTcpNodelay(true)- this option may not well work for big batches.
Thanks!
@chernser Thank you for quick reply. You've helped a lot.
The preferred way for us is switching to client-v2 and use it everywhere as it's a recommended client (and hopefully would be improved further).
I've tried to remove both options and it solved the issue. 50000 records insert took ~18 seconds average.
I've tried also batches of ~10000 and it took ~5 seconds on average.
Is the duration around expected? I know it's a vague question and depends on a lot o things (like data, cpu, memory constraints, etc...). I just want to make sure that it's not expected to be within milliseconds.
Regarding prepared statement I'll provide full example tomorrow. Hope it helps fix the issue on your side (if it exists on your side).
@DKyryk glad to hear that it helped!
I think, 5 seconds for 10K record a bit slow. I would expect 1 - 2 seconds for remote application. But this very depends on many parameters.
- First what you would need to know the size of a batch.
- Try tuning socket and network buffer. 10mb may be too small. Try 30mb, 50mb. No more than 100mb however.
- What platform do you run application on? (We have native compression support for Linux only)
- If possible - check JVM stats like GC work, heap structure.
- Increase JVM memory to get room for storing batch and client buffers.
Good day. As promised full insert code with prepared statement. Just a reminder that it works on 0.5.0 JDBC but fails on 0.9.0 jdbc.
public void save(List<TestRecord> entities) {
if (entities.size() > 0) {
var query = """
INSERT INTO test_null_local (field_1, field_2, field_4, field_9, field_10, field_20,
field_21, field_22, field_23, field_24, field_25, field_27, field_30, field_31, field_3, field_5,
field_11.key, field_11.value, field_12, field_13, field_14.key, field_14.value,
field_15.name, field_15.id, field_16.name, field_16.field, field_16.value, field_18, field_19,
field_28, field_29, field_6, field_17, field_26, field_7, field_8)
VALUES(?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
""";
try (var connection = dataSource.getConnection();
var statement = connection.prepareStatement(query)) {
for (var entity : entities) {
statement.setString(1, entity.field1());
statement.setLong(2, entity.field2());
statement.setString(3, entity.field4());
statement.setString(4, entity.field9());
statement.setString(5, entity.field10());
statement.setString(6, entity.field20());
statement.setString(7, entity.field21());
statement.setString(8, entity.field22());
statement.setString(9, entity.field23());
statement.setString(10, entity.field24());
statement.setString(11, entity.field25());
statement.setObject(12, entity.field27());
statement.setObject(13, entity.field30());
statement.setObject(14, entity.field31());
statement.setString(15, entity.field3());
statement.setString(16, entity.field5());
statement.setObject(17, entity.field11keys());
statement.setObject(18, entity.field11values());
statement.setString(19, entity.field12());
statement.setString(20, entity.field13());
statement.setObject(21, entity.field14keys());
statement.setObject(22, entity.field14values());
statement.setObject(23, entity.field15names());
statement.setObject(24, entity.field15ids());
statement.setObject(25, entity.field16names());
statement.setObject(26, entity.field16fields());
statement.setObject(27, entity.field16values());
statement.setString(28, entity.field18());
statement.setObject(29, entity.field19());
statement.setObject(30, entity.field28());
statement.setObject(31, entity.field29());
statement.setString(32, entity.field6());
statement.setObject(33, entity.field17());
statement.setInt(34, entity.field26());
statement.setObject(35, entity.field7());
statement.setObject(36, entity.field8());
statement.addBatch();
}
statement.executeBatch();
} catch (SQLException e) {
log.error("SQL error insert", e);
}
}
}
And entity for reference
public record TestRecord(String field1,
long field2,
String field3,
String field4,
String field5,
String field6,
String field8,
Long field7,
String[] field11keys,
String[] field11values,
String field12,
String field13,
String[] field14keys,
String[] field14values,
String[] field15names,
int[] field15ids,
String field18,
String[] field19,
String[] field16names,
String[] field16fields,
String[] field16values,
Double field17,
Instant field28,
Instant field29,
String field9,
String field10,
String field20,
String field21,
String field22,
String field23,
String field24,
String field25,
int field26,
String[] field27,
Instant field30,
Instant field31) {
}
We'll move further with client-v2.
Regarding your questions: we run app in AWS Kubernetes so nodes are on Amazon Linux. App itself uses temurin Java 21 as base image with os Ubuntu 22.04.3 LTS.
Regarding resources we are a bit constrained due to cluster limitation.
I'll try tuning parameters and monitor the app.
I don't know should I close this issue or not.
Thank you for your help.
@DKyryk thank you for the statement! is it the one that gives error?
If you data source is something like JSON stream and you do not need to pre-process it - then may be using client-v2 and sending JSONEachRow would be the best for you. If you have DTO (POJO) then POJO serialization might be the best choice. We going to improve more this area.
Yes that's the full code that works in jdbc version 0.5.0 and doesn't in version 0.9.0 and prints to log
line 4:15 extraneous input '.' expecting {',', ')'}
line 4:55 extraneous input '.' expecting {',', ')'}
Good day @chernser. We've got occasional NoHttpResponseException from time to time in different services for different queries (no noticeable patterns to limit to something specific). Client v2 is used everywhere. Each time query retries successfully.
Error message is
Retrying. org.apache.hc.core5.http.NoHttpResponseException: [DB_URL] failed to respond
What could be the possible cause?
Good day, @DKyryk !
I suspect it is stale connection. We had similar issue with v1 and that was a reason we have implemented retries.
- try setting
com.clickhouse.client.api.ClientConfigProperties#SOCKET_KEEPALIVE_OPTwithcom.clickhouse.client.api.Client.Builder#setOption(.setOption(SOCKET_KEEPALIVE_OPT.getKey(), "true")` - check server logs for errors
- set lower keep-alive timeout for connect
com.clickhouse.client.api.Client.Builder#setKeepAliveTimeout
Thanks for suggestions. I tried to set both SOCKET_KEEPALIVE_OPT and keep-alive timeout. Results are: With 3 seconds timeout retries still happened but less than before. With 10 seconds timeout there were a lot of retries. With 2 seconds timeout there were no retries so far (3 days passed).
Good day, @DKyryk ! Thank you for letting us know!
It is quite strange that stale connections happen for 10 seconds. We will investigate.
I'm seeing the same behavior for keep-alive and failed to respond.
Server (via haproxy) returns keep-alive: timeout=3, max=9999. But when I set connection: close on each request all the problems go away.
I'm not a java expert, but from the docs looks like we are missing evictExpiredConnections() for HttpClientBuilder.
Good day, @bobelev ! What are server and client versions?
We will look into it shortly.
Thanks!
client: 0.9.1 server: 24.8, 25.8