clickhouse-java icon indicating copy to clipboard operation
clickhouse-java copied to clipboard

Need assistance with migration to either latest jdbc or preferably client-v2

Open DKyryk opened this issue 4 months ago • 15 comments

The documentation is a bit lacking so I'd appreciate assistance with migration from prepared statement old version jdbc to client v2 batch insert logic.

Clickhouse: on-prem installation version: clickhouse-server:23.8.9.54

We have a logic that inserts batch of records (50000) into table. Then we have materialised view that aggregates data into aggregated table.

Table schema (field names obfuscated)

CREATE TABLE IF NOT EXISTS cloud.test_null_local ON CLUSTER testcluster
(
    field_1                LowCardinality(String),
    field_2               UInt64,
    field_3               LowCardinality(String),
    field_4               String,
    field_5               Nullable(String),
    field_6               Nullable(String),
    field_7               Nullable(UInt64),
    field_8               Nullable(String),
    field_9               String,
    field_10              String,
    field_11              Nested(
        key String,
        value String
    ),
    field_12              Nullable(String),
    field_13              Nullable(String),
    field_14              Nested(
        key String,
        value String
    ),
    field_15               Nested(
        name String,
        id UInt16
    ),
    field_16               Nested(
        name String,
        field String,
        value String
    ),
    field_17               Nullable(Float32),
    field_18               Nullable(String),
    field_19               Array(String),
    field_20              String,
    field_21               Nullable(String),
    field_22               String,
    field_23               Nullable(String),
    field_24               Nullable(String),
    field_25               String,
    field_26               UInt32,
    field_27               Array(String),
    field_28               Nullable(DateTime),
    field_29               Nullable(DateTime),
    field_30               DateTime,
    field_31                DateTime,
    field_32               DateTime default now()
)
ENGINE = Null;

Here is java code

public void save(List<TestRecord> entities) {

        if (entities.size() > 0) {
            log.info("Store {} in DB", entities.size());
            var query = """
                INSERT INTO test_null_local (field_1, field_2, //... other columns omitted    )
                VALUES(?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
                """;
            try (var connection = dataSource.getConnection();
                 var statement = connection.prepareStatement(query)) {

                for (var entity : entities) {
                    statement.setString(1, entity.field1())
                    //Other fields setters
                    statement.addBatch();
                }
                statement.executeBatch();

            } catch (SQLException e) {
                log.error("SQL error insert ", e);
            }
        }
    }

This logic works with jdbc version 0.5.0 + HikaridCP 5.0.1 and it takes ~90 seconds to insert 50_000 records.

Once I tried to simply update jdbc to version 0.9.0 the insert logic hangs (I've waited for 10 minutes and stopped the process). Plus it prints to log:

line 4:15 extraneous input '.' expecting {',', ')'}
line 4:55 extraneous input '.' expecting {',', ')'}

So that version didn't help. I've tried to migrate to client-v2 POJO insert like this. Client configuration:

client = new Client.Builder()
            .addEndpoint(appConfig.dbUrl())
            .setDefaultDatabase(appConfig.dbName())
            .setUsername(appConfig.dbUser())
            .setPassword(appConfig.dbPassword())
            .setMaxConnections(10)
            .compressClientRequest(true)
            .useHttpCompression(true)
            .setLZ4UncompressedBufferSize(10_058_576)
            .setSocketRcvbuf(500_000)
            .setSocketTcpNodelay(true)
            .setSocketSndbuf(10_000_000)
            .setClientNetworkBufferSize(10_000_000)
            .allowBinaryReaderToReuseBuffers(true)
            .build();
        client.register(TestEntity.class, client.getTableSchema("test_null_local"));

Insert code:

    public void save(List<TestEntity> entities) {
        
        try (var response = client.insert(TABLE, entities).get()) {
        } catch (Exception e) {
            log.error("SQL error insert", e);
        }
    }

POJO code (getters and setters omitted):

import java.time.LocalDateTime;
import java.util.List;

public class TestEntity {

    private String field_1;
    private long field_2;
    private String field_3;
    private String field_4;
    private String field_5;
    private String field_6;
    private Long field_7;
    private String field_8;
    private String field_9;
    private String field_10;
    private List<String> field_11_key;
    private List<String> field_11_value;
    private String field_12;
    private String field_13;
    private List<String> field_14_key;
    private List<String> field_14_value;
    private List<String> field_15_name;
    private List<Integer> field_15_id;
    private List<String> field_16_key;
    private List<String> field_16_field;
    private List<String> field_16_value;
    private Float field_17;
    private String field_18;
    private List<String> field_19;
    private String field_20;
    private String field_21;
    private String field_22;
    private String field_23;
    private String field_24;
    private String field_25;
    private long field_26;
    private List<String> field_27;
    private LocalDateTime field_28;
    private LocalDateTime field_29;
    private LocalDateTime field_30;
    private LocalDateTime field_31;
    private LocalDateTime field_32;
}

With this code I've got no http error 37 seconds after insert call.

c.c.c.a.Client:1250 Retrying. org.apache.hc.core5.http.NoHttpResponseException: [DBURL here] failed to respond

I've tried to reduce batch to 10_000 records but it didn't help. Several hundreds records could be inserted successfully but it kinda contradicts batching guidelines.

DKyryk avatar Aug 06 '25 16:08 DKyryk

Good day, @DKyryk! Thank you for the feedback.

The JDBC v2 has an option to do insert using RowBinaryFormat if parameters are not wrapped with function calls. Please set driver property row_binary_for_simple_insert=true to enable this.

If there is no requirement to use JDBC we recommend to use java client directly because it will be more optimized way (less checks and type conversions compare to JDBC).

Default JDBC implementation will build a string statement and send it as is. But we are aware of the problem and will come up with a solution. Just JDBC has some API constraints.

chernser avatar Aug 06 '25 17:08 chernser

@DKyryk regarding

line 4:15 extraneous input '.' expecting {',', ')'}
line 4:55 extraneous input '.' expecting {',', ')'}

It should be fixed in the 0.9.1 but I will re-test it. However I would need a complete statement may be.

chernser avatar Aug 06 '25 17:08 chernser

@DKyryk regarding client-v2 usage

  • configuration seems right.
  • please try removing .useHttpCompression(true) - this option uses not the best lz4 implementation and ClikHouse's LZ4 support works better.
  • please try removing .setSocketTcpNodelay(true) - this option may not well work for big batches.

Thanks!

chernser avatar Aug 06 '25 17:08 chernser

@chernser Thank you for quick reply. You've helped a lot.

The preferred way for us is switching to client-v2 and use it everywhere as it's a recommended client (and hopefully would be improved further).

I've tried to remove both options and it solved the issue. 50000 records insert took ~18 seconds average.

I've tried also batches of ~10000 and it took ~5 seconds on average.

Is the duration around expected? I know it's a vague question and depends on a lot o things (like data, cpu, memory constraints, etc...). I just want to make sure that it's not expected to be within milliseconds.

Regarding prepared statement I'll provide full example tomorrow. Hope it helps fix the issue on your side (if it exists on your side).

DKyryk avatar Aug 06 '25 17:08 DKyryk

@DKyryk glad to hear that it helped!

I think, 5 seconds for 10K record a bit slow. I would expect 1 - 2 seconds for remote application. But this very depends on many parameters.

  1. First what you would need to know the size of a batch.
  2. Try tuning socket and network buffer. 10mb may be too small. Try 30mb, 50mb. No more than 100mb however.
  3. What platform do you run application on? (We have native compression support for Linux only)
  4. If possible - check JVM stats like GC work, heap structure.
  5. Increase JVM memory to get room for storing batch and client buffers.

chernser avatar Aug 06 '25 18:08 chernser

Good day. As promised full insert code with prepared statement. Just a reminder that it works on 0.5.0 JDBC but fails on 0.9.0 jdbc.

public void save(List<TestRecord> entities) {

        if (entities.size() > 0) {
            var query = """
                INSERT INTO test_null_local (field_1, field_2, field_4, field_9, field_10, field_20,
                field_21, field_22, field_23, field_24, field_25, field_27, field_30, field_31, field_3, field_5,
                field_11.key, field_11.value, field_12, field_13, field_14.key, field_14.value,
                field_15.name, field_15.id, field_16.name, field_16.field, field_16.value, field_18, field_19,
                field_28, field_29, field_6, field_17, field_26, field_7, field_8)
                VALUES(?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
                """;
            try (var connection = dataSource.getConnection();
                 var statement = connection.prepareStatement(query)) {

                for (var entity : entities) {
                    statement.setString(1, entity.field1());
                    statement.setLong(2, entity.field2());
                    statement.setString(3, entity.field4());
                    statement.setString(4, entity.field9());
                    statement.setString(5, entity.field10());
                    statement.setString(6, entity.field20());
                    statement.setString(7, entity.field21());
                    statement.setString(8, entity.field22());
                    statement.setString(9, entity.field23());
                    statement.setString(10, entity.field24());
                    statement.setString(11, entity.field25());
                    statement.setObject(12, entity.field27());
                    statement.setObject(13, entity.field30());
                    statement.setObject(14, entity.field31());
                    statement.setString(15, entity.field3());
                    statement.setString(16, entity.field5());
                    statement.setObject(17, entity.field11keys());
                    statement.setObject(18, entity.field11values());
                    statement.setString(19, entity.field12());
                    statement.setString(20, entity.field13());
                    statement.setObject(21, entity.field14keys());
                    statement.setObject(22, entity.field14values());
                    statement.setObject(23, entity.field15names());
                    statement.setObject(24, entity.field15ids());
                    statement.setObject(25, entity.field16names());
                    statement.setObject(26, entity.field16fields());
                    statement.setObject(27, entity.field16values());
                    statement.setString(28, entity.field18());
                    statement.setObject(29, entity.field19());
                    statement.setObject(30, entity.field28());
                    statement.setObject(31, entity.field29());
                    statement.setString(32, entity.field6());
                    statement.setObject(33, entity.field17());
                    statement.setInt(34, entity.field26());
                    statement.setObject(35, entity.field7());
                    statement.setObject(36, entity.field8());

                    statement.addBatch();
                }
                statement.executeBatch();

            } catch (SQLException e) {
                log.error("SQL error insert", e);
            }
        }
    }

And entity for reference

public record TestRecord(String field1,
                         long field2,
                         String field3,
                         String field4,
                         String field5,
                         String field6,
                         String field8,
                         Long field7,
                         String[] field11keys,
                         String[] field11values,
                         String field12,
                         String field13,
                         String[] field14keys,
                         String[] field14values,
                         String[] field15names,
                         int[] field15ids,
                         String field18,
                         String[] field19,
                         String[] field16names,
                         String[] field16fields,
                         String[] field16values,
                         Double field17,
                         Instant field28,
                         Instant field29,
                         String field9,
                         String field10,
                         String field20,
                         String field21,
                         String field22,
                         String field23,
                         String field24,
                         String field25,
                         int field26,
                         String[] field27,
                         Instant field30,
                         Instant field31) {
}

We'll move further with client-v2.

Regarding your questions: we run app in AWS Kubernetes so nodes are on Amazon Linux. App itself uses temurin Java 21 as base image with os Ubuntu 22.04.3 LTS.

Regarding resources we are a bit constrained due to cluster limitation.

I'll try tuning parameters and monitor the app.

I don't know should I close this issue or not.

Thank you for your help.

DKyryk avatar Aug 07 '25 10:08 DKyryk

@DKyryk thank you for the statement! is it the one that gives error?

If you data source is something like JSON stream and you do not need to pre-process it - then may be using client-v2 and sending JSONEachRow would be the best for you. If you have DTO (POJO) then POJO serialization might be the best choice. We going to improve more this area.

chernser avatar Aug 07 '25 14:08 chernser

Yes that's the full code that works in jdbc version 0.5.0 and doesn't in version 0.9.0 and prints to log

line 4:15 extraneous input '.' expecting {',', ')'}
line 4:55 extraneous input '.' expecting {',', ')'}

DKyryk avatar Aug 08 '25 11:08 DKyryk

Good day @chernser. We've got occasional NoHttpResponseException from time to time in different services for different queries (no noticeable patterns to limit to something specific). Client v2 is used everywhere. Each time query retries successfully.

Error message is Retrying. org.apache.hc.core5.http.NoHttpResponseException: [DB_URL] failed to respond

What could be the possible cause?

DKyryk avatar Aug 18 '25 12:08 DKyryk

Good day, @DKyryk !

I suspect it is stale connection. We had similar issue with v1 and that was a reason we have implemented retries.

  • try setting com.clickhouse.client.api.ClientConfigProperties#SOCKET_KEEPALIVE_OPT with com.clickhouse.client.api.Client.Builder#setOption (.setOption(SOCKET_KEEPALIVE_OPT.getKey(), "true")`
  • check server logs for errors
  • set lower keep-alive timeout for connect com.clickhouse.client.api.Client.Builder#setKeepAliveTimeout

chernser avatar Aug 20 '25 05:08 chernser

Thanks for suggestions. I tried to set both SOCKET_KEEPALIVE_OPT and keep-alive timeout. Results are: With 3 seconds timeout retries still happened but less than before. With 10 seconds timeout there were a lot of retries. With 2 seconds timeout there were no retries so far (3 days passed).

DKyryk avatar Aug 28 '25 14:08 DKyryk

Good day, @DKyryk ! Thank you for letting us know!

It is quite strange that stale connections happen for 10 seconds. We will investigate.

chernser avatar Aug 28 '25 19:08 chernser

I'm seeing the same behavior for keep-alive and failed to respond.

Server (via haproxy) returns keep-alive: timeout=3, max=9999. But when I set connection: close on each request all the problems go away.

I'm not a java expert, but from the docs looks like we are missing evictExpiredConnections() for HttpClientBuilder.

bobelev avatar Nov 13 '25 06:11 bobelev

Good day, @bobelev ! What are server and client versions?

We will look into it shortly.

Thanks!

chernser avatar Nov 13 '25 15:11 chernser

client: 0.9.1 server: 24.8, 25.8

bobelev avatar Nov 14 '25 09:11 bobelev