clickhouse-java
clickhouse-java copied to clipboard
Insert function is not working
I am not able to insert the CSV data into the existing table in click house. using clickhouse-java
I run the below code but it throws the errro:
static long insert(ClickHouseNode server, String table) throws ClickHouseException {
try (ClickHouseClient client = ClickHouseClient.newInstance(server.getProtocol())) {
ClickHouseRequest.Mutation request = client.connect(server).write().table(table)
.format(ClickHouseFormat.RowBinary);
ClickHouseConfig config = request.getConfig();
CompletableFuture<ClickHouseResponse> future;
// back-pressuring is not supported, you can adjust the first two arguments
try (ClickHousePipedOutputStream stream = ClickHouseDataStreamFactory.getInstance()
.createPipedOutputStream(config, (Runnable) null)) {
// in async mode, which is default, execution happens in a worker thread
future = request.data(stream.getInputStream()).execute();
// writing happens in main thread
for (int i = 0; i < 10_000; i++) {
BinaryStreamUtils.writeString(stream, String.valueOf(i % 16));
BinaryStreamUtils.writeNonNull(stream);
BinaryStreamUtils.writeString(stream, UUID.randomUUID().toString());
}
}
// response should be always closed
try (ClickHouseResponse response = future.get()) {
ClickHouseResponseSummary summary = response.getSummary();
return summary.getWrittenRows();
}
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
throw ClickHouseException.forCancellation(e, server);
} catch (ExecutionException | IOException e) {
throw ClickHouseException.of(e, server);
}
}
Error
java.util.concurrent.ExecutionException: com.clickhouse.client.ClickHouseException: Code: 33. DB::Exception: Cannot read all data. Bytes read: 2. Bytes expected: 4.: (at row 14272)
: While executing BinaryRowInputFormat. (CANNOT_READ_ALL_DATA) (version 23.6.2.18 (official build))
, server ClickHouseNode [uri=http://localhost:8123/ShabirBhat_project]@-2130346309
at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
at com.datoin.ch.CHclientTest.insert(CHclientTest.java:77)
at com.datoin.ch.CHclientTest.main(CHclientTest.java:40)
Caused by: com.clickhouse.client.ClickHouseException: Code: 33. DB::Exception: Cannot read all data. Bytes read: 2. Bytes expected: 4.: (at row 14272)
: While executing BinaryRowInputFormat. (CANNOT_READ_ALL_DATA) (version 23.6.2.18 (official build))
, server ClickHouseNode [uri=http://localhost:8123/ShabirBhat_project]@-2130346309
at com.clickhouse.client.ClickHouseException.of(ClickHouseException.java:168)
at com.clickhouse.client.AbstractClient.lambda$execute$0(AbstractClient.java:275)
at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.io.IOException: Code: 33. DB::Exception: Cannot read all data. Bytes read: 2. Bytes expected: 4.: (at row 14272)
: While executing BinaryRowInputFormat. (CANNOT_READ_ALL_DATA) (version 23.6.2.18 (official build))
at com.clickhouse.client.http.HttpUrlConnectionImpl.checkResponse(HttpUrlConnectionImpl.java:184)
at com.clickhouse.client.http.HttpUrlConnectionImpl.post(HttpUrlConnectionImpl.java:227)
at com.clickhouse.client.http.ClickHouseHttpClient.send(ClickHouseHttpClient.java:124)
at com.clickhouse.client.AbstractClient.sendAsync(AbstractClient.java:161)
at com.clickhouse.client.AbstractClient.lambda$execute$0(AbstractClient.java:273)
... 4 more
Process finished with exit code 0
I tried to replace the below code with my csv data
// writing happens in main thread
for (int i = 0; i < 10_000; i++) {
BinaryStreamUtils.writeString(stream, String.valueOf(i % 16));
BinaryStreamUtils.writeNonNull(stream);
BinaryStreamUtils.writeString(stream, UUID.randomUUID().toString());
}
Replace with :
BufferedReader reader = new BufferedReader(new FileReader(csvFilePath));
String line;
while ((line = reader.readLine()) != null) {
System.out.println(line);
BinaryStreamUtils.writeString(stream, line);
}
Hi @shabir1, apologize for the confusing methods. BinaryStreamUtils.writeString
only works for writing a String in RowBinary
data format. It looks like this is duplicated with #1402? Let me know if you want to close this and carry on discussion there.
@shabir1 did you managed?
@mzitnik I used the below method to upload the local file on clickhouse.
static long insert(String tableName, String csvFilePath) throws Exception {
ClickHouseNode server = getServer();
try (BufferedInputStream bufferedInputStream = new BufferedInputStream(new FileInputStream(csvFilePath))) {
ClickHousePassThruStream stream = ClickHousePassThruStream.of(
bufferedInputStream,
ClickHouseCompression.NONE,
0,
ClickHouseFormat.CSVWithNames
);
CompletableFuture<ClickHouseResponseSummary> load = ClickHouseClient.load(server, tableName, stream);
ClickHouseResponseSummary clickHouseResponse = load.get();
return clickHouseResponse.getWrittenRows();
} catch (Exception e) {
System.out.println("Failed to load");
e.printStackTrace();
throw new Exception();
}
}
But on huge files, it failed with memory issues, Most failures occurred on the file that contained more than 300 columns.
ERROR [2023-07-27 06:20:36,451] com.datoin.library.dataset.sql.utility.commons.abstracts.sql.AbstractDBEngine: Failed to load dataset
java.util.concurrent.ExecutionException: com.clickhouse.client.ClickHouseException: Code: 241. DB::Exception: Memory limit (total) exceeded:
would use 3.36 GiB (attempt to allocate chunk of 4360384 bytes), maximum: 3.24 GiB. OvercommitTracker decision: Query was selected to stop by
OvercommitTracker. (MEMORY_LIMIT_EXCEEDED) (version 23.6.2.18 (official build))
, server ClickHouseNode [uri=http://localhost:8123/cayleytest95814, options={connect_timeout=5000000,socket_timeout=5000000}]@75218225
at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
@shabir1 from here it looks like a ClickHouse server limit issue did you tried to set memory
@mzitnik How to set memory in the above method? and if I set max_memory_usage_for_user = 10, Is it 10 Bytes/MB/GB ?
@shabir1 It is bytes
@mzitnik How to set memory in the below method? like max_memory_usage_for_user, max_memory_usage, etc
static long insert(String tableName, String csvFilePath) throws Exception {
ClickHouseNode server = ClickHouseNode.builder()
.host(System.getProperty("chHost", "localhost"))
.port(ClickHouseProtocol.HTTP, Integer.getInteger("chPort", 8123))
.database("ShabirBhat_project")
.addOption("connect_timeout", String.valueOf(1350 * 1000))
.addOption("socket_timeout", String.valueOf(1350 * 1000))
.credentials(ClickHouseCredentials.fromUserAndPassword(
System.getProperty("chUser", "default"), System.getProperty("chPassword", "")))
.build();
try (BufferedInputStream bufferedInputStream = new BufferedInputStream(new FileInputStream(csvFilePath))) {
ClickHousePassThruStream stream = ClickHousePassThruStream.of(
bufferedInputStream,
ClickHouseCompression.NONE,
0,
ClickHouseFormat.CSVWithNames
);
CompletableFuture<ClickHouseResponseSummary> load = ClickHouseClient.load(server, tableName, stream);
ClickHouseResponseSummary clickHouseResponse = load.get();
return clickHouseResponse.getWrittenRows();
} catch (Exception e) {
System.out.println("Failed to load");
e.printStackTrace();
throw new Exception();
}
}
@mshustov @mzitnik If the file size is 5GB but the number of columns is less (less than 10 columns).then the above method works fine. But if the file size is just 200MB but the number of columns is huge (greater than 800 columns) then the above method failed with memory issues.