clickhouse-java icon indicating copy to clipboard operation
clickhouse-java copied to clipboard

Insert function is not working

Open shabir1 opened this issue 1 year ago • 8 comments

I am not able to insert the CSV data into the existing table in click house. using clickhouse-java

I run the below code but it throws the errro:

static long insert(ClickHouseNode server, String table) throws ClickHouseException {
        try (ClickHouseClient client = ClickHouseClient.newInstance(server.getProtocol())) {
            ClickHouseRequest.Mutation request = client.connect(server).write().table(table)
                    .format(ClickHouseFormat.RowBinary);
            ClickHouseConfig config = request.getConfig();
            CompletableFuture<ClickHouseResponse> future;
            // back-pressuring is not supported, you can adjust the first two arguments
            try (ClickHousePipedOutputStream stream = ClickHouseDataStreamFactory.getInstance()
                    .createPipedOutputStream(config, (Runnable) null)) {
                // in async mode, which is default, execution happens in a worker thread
                future = request.data(stream.getInputStream()).execute();

                // writing happens in main thread
                for (int i = 0; i < 10_000; i++) {
                    BinaryStreamUtils.writeString(stream, String.valueOf(i % 16));
                    BinaryStreamUtils.writeNonNull(stream);
                    BinaryStreamUtils.writeString(stream, UUID.randomUUID().toString());
                }
            }

            // response should be always closed
            try (ClickHouseResponse response = future.get()) {
                ClickHouseResponseSummary summary = response.getSummary();
                return summary.getWrittenRows();
            }
        } catch (InterruptedException e) {
            Thread.currentThread().interrupt();
            throw ClickHouseException.forCancellation(e, server);
        } catch (ExecutionException | IOException e) {
            throw ClickHouseException.of(e, server);
        }
    }

Error

java.util.concurrent.ExecutionException: com.clickhouse.client.ClickHouseException: Code: 33. DB::Exception: Cannot read all data. Bytes read: 2. Bytes expected: 4.: (at row 14272)
: While executing BinaryRowInputFormat. (CANNOT_READ_ALL_DATA) (version 23.6.2.18 (official build))
, server ClickHouseNode [uri=http://localhost:8123/ShabirBhat_project]@-2130346309
	at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
	at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
	at com.datoin.ch.CHclientTest.insert(CHclientTest.java:77)
	at com.datoin.ch.CHclientTest.main(CHclientTest.java:40)
Caused by: com.clickhouse.client.ClickHouseException: Code: 33. DB::Exception: Cannot read all data. Bytes read: 2. Bytes expected: 4.: (at row 14272)
: While executing BinaryRowInputFormat. (CANNOT_READ_ALL_DATA) (version 23.6.2.18 (official build))
, server ClickHouseNode [uri=http://localhost:8123/ShabirBhat_project]@-2130346309
	at com.clickhouse.client.ClickHouseException.of(ClickHouseException.java:168)
	at com.clickhouse.client.AbstractClient.lambda$execute$0(AbstractClient.java:275)
	at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)
Caused by: java.io.IOException: Code: 33. DB::Exception: Cannot read all data. Bytes read: 2. Bytes expected: 4.: (at row 14272)
: While executing BinaryRowInputFormat. (CANNOT_READ_ALL_DATA) (version 23.6.2.18 (official build))

	at com.clickhouse.client.http.HttpUrlConnectionImpl.checkResponse(HttpUrlConnectionImpl.java:184)
	at com.clickhouse.client.http.HttpUrlConnectionImpl.post(HttpUrlConnectionImpl.java:227)
	at com.clickhouse.client.http.ClickHouseHttpClient.send(ClickHouseHttpClient.java:124)
	at com.clickhouse.client.AbstractClient.sendAsync(AbstractClient.java:161)
	at com.clickhouse.client.AbstractClient.lambda$execute$0(AbstractClient.java:273)
	... 4 more

Process finished with exit code 0

I tried to replace the below code with my csv data

 // writing happens in main thread
                for (int i = 0; i < 10_000; i++) {
                    BinaryStreamUtils.writeString(stream, String.valueOf(i % 16));
                    BinaryStreamUtils.writeNonNull(stream);
                    BinaryStreamUtils.writeString(stream, UUID.randomUUID().toString());
                }


Replace with :
BufferedReader reader = new BufferedReader(new FileReader(csvFilePath));
                String line;
                while ((line = reader.readLine()) != null) {
                    System.out.println(line);
                    BinaryStreamUtils.writeString(stream, line);
                }

shabir1 avatar Jul 13 '23 07:07 shabir1

Hi @shabir1, apologize for the confusing methods. BinaryStreamUtils.writeString only works for writing a String in RowBinary data format. It looks like this is duplicated with #1402? Let me know if you want to close this and carry on discussion there.

zhicwu avatar Jul 13 '23 11:07 zhicwu

@shabir1 did you managed?

mzitnik avatar Aug 02 '23 08:08 mzitnik

@mzitnik I used the below method to upload the local file on clickhouse.

static long insert(String tableName, String csvFilePath) throws Exception {
        ClickHouseNode server = getServer();
        try (BufferedInputStream bufferedInputStream = new BufferedInputStream(new FileInputStream(csvFilePath))) {
            ClickHousePassThruStream stream = ClickHousePassThruStream.of(
                    bufferedInputStream,
                    ClickHouseCompression.NONE,
                    0,
                    ClickHouseFormat.CSVWithNames
            );
            CompletableFuture<ClickHouseResponseSummary> load = ClickHouseClient.load(server, tableName, stream);
            ClickHouseResponseSummary clickHouseResponse = load.get();
            return clickHouseResponse.getWrittenRows();
        } catch (Exception e) {
            System.out.println("Failed to  load");
            e.printStackTrace();
            throw new Exception();
        }
    }

But on huge files, it failed with memory issues, Most failures occurred on the file that contained more than 300 columns.

ERROR [2023-07-27 06:20:36,451] com.datoin.library.dataset.sql.utility.commons.abstracts.sql.AbstractDBEngine: Failed to load dataset
java.util.concurrent.ExecutionException: com.clickhouse.client.ClickHouseException: Code: 241. DB::Exception: Memory limit (total) exceeded:
would use 3.36 GiB (attempt to allocate chunk of 4360384 bytes), maximum: 3.24 GiB. OvercommitTracker decision: Query was selected to stop by
 OvercommitTracker. (MEMORY_LIMIT_EXCEEDED) (version 23.6.2.18 (official build))
, server ClickHouseNode [uri=http://localhost:8123/cayleytest95814, options={connect_timeout=5000000,socket_timeout=5000000}]@75218225
        at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
        at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)

shabir1 avatar Aug 03 '23 05:08 shabir1

@shabir1 from here it looks like a ClickHouse server limit issue did you tried to set memory

mzitnik avatar Aug 03 '23 10:08 mzitnik

@mzitnik How to set memory in the above method? and if I set max_memory_usage_for_user = 10, Is it 10 Bytes/MB/GB ?

shabir1 avatar Aug 07 '23 06:08 shabir1

@shabir1 It is bytes

mzitnik avatar Aug 07 '23 16:08 mzitnik

@mzitnik How to set memory in the below method? like max_memory_usage_for_user, max_memory_usage, etc

   static long insert(String tableName, String csvFilePath) throws Exception {
        ClickHouseNode server = ClickHouseNode.builder()
                .host(System.getProperty("chHost", "localhost"))
                .port(ClickHouseProtocol.HTTP, Integer.getInteger("chPort", 8123))
                .database("ShabirBhat_project")
                .addOption("connect_timeout", String.valueOf(1350 * 1000))
                .addOption("socket_timeout", String.valueOf(1350 * 1000))
                .credentials(ClickHouseCredentials.fromUserAndPassword(
                        System.getProperty("chUser", "default"), System.getProperty("chPassword", "")))
                .build();
        try (BufferedInputStream bufferedInputStream = new BufferedInputStream(new FileInputStream(csvFilePath))) {
            ClickHousePassThruStream stream = ClickHousePassThruStream.of(
                    bufferedInputStream,
                    ClickHouseCompression.NONE,
                    0,
                    ClickHouseFormat.CSVWithNames
            );
            CompletableFuture<ClickHouseResponseSummary> load = ClickHouseClient.load(server, tableName, stream);
            ClickHouseResponseSummary clickHouseResponse = load.get();
            return clickHouseResponse.getWrittenRows();
        } catch (Exception e) {
            System.out.println("Failed to  load");
            e.printStackTrace();
            throw new Exception();
        }
    }

shabir1 avatar Aug 14 '23 10:08 shabir1

@mshustov @mzitnik If the file size is 5GB but the number of columns is less (less than 10 columns).then the above method works fine. But if the file size is just 200MB but the number of columns is huge (greater than 800 columns) then the above method failed with memory issues.

shabir1 avatar Aug 22 '23 07:08 shabir1