clickhouse-java icon indicating copy to clipboard operation
clickhouse-java copied to clipboard

NullPointerException while loading Parquet file using ClickHouseClient

Open porechajp opened this issue 1 year ago • 1 comments

Hello,

I am trying to load Parquet file to Clickhouse table using com.clickhouse.client.ClickHouseClient and the execution fails at the end with the following exception.

Please note that the data does get loaded successfully.

Exception,

Exception in thread "main" java.util.concurrent.ExecutionException: java.lang.NullPointerException: Cannot invoke "com.clickhouse.data.ClickHouseDataProcessor.getInputStream()" because "this.processor" is null
	at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:396)
	at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2073)
	at com.tnt.dbimex.ChImportApplication.main(ChImportApplication.java:27)
Caused by: java.lang.NullPointerException: Cannot invoke "com.clickhouse.data.ClickHouseDataProcessor.getInputStream()" because "this.processor" is null
	at com.clickhouse.client.ClickHouseStreamResponse.close(ClickHouseStreamResponse.java:94)
	at com.clickhouse.client.ClickHouseClient.lambda$load$8(ClickHouseClient.java:444)
	at com.clickhouse.client.ClickHouseClient.run(ClickHouseClient.java:232)
	at com.clickhouse.client.ClickHouseClient.lambda$submit$4(ClickHouseClient.java:284)
	at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1768)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at java.base/java.lang.Thread.run(Thread.java:1583)

The code written is ,

	public static void main(String[] args) throws FileNotFoundException, InterruptedException, ExecutionException {

		var node = ClickHouseNode.of("http://127.0.0.1:8123?compress=0"); // ?compress_algorithm=gzip

		var future = ClickHouseClient.load(node, "STG_2.AUDIT_TRAIL",
				ClickHousePassThruStream.of(new FileInputStream("C:/Temp/AUDIT_TRAIL.parquet"),
						ClickHouseCompression.NONE, ClickHouseFormat.Parquet));
	
		
		System.out.println(future.get());
		
	}

After doing some step debug, I found that following code in com.clickhouse.client.ClickHouseStreamResponse does not check processor being null,

  @Override
    public void close() {
        final ClickHouseInputStream input = processor.getInputStream();
        if (closed || input.isClosed()) {
            return;
        }

Permalink : https://github.com/ClickHouse/clickhouse-java/blob/6a0856fdc4dbfed89dd2d0030f20accdec63bce8/clickhouse-client/src/main/java/com/clickhouse/client/ClickHouseStreamResponse.java#L94C45-L94C54

The processor remains null because in the constructor of ClickHouseStreamResponse, ClickHouseDataStreamFactory.getInstance().getProcessor returns null for the ClickHouseFormat.Parquet.

It is because the following method instantiates the ClickHouseDataProcessor only if the format is RowBinary and RowBinaryWithNamesAndtypes OR a textual format.

    public ClickHouseDataProcessor getProcessor(ClickHouseDataConfig config, ClickHouseInputStream input,
            ClickHouseOutputStream output, Map<String, Serializable> settings, List<ClickHouseColumn> columns)
            throws IOException {
        ClickHouseFormat format = ClickHouseChecker.nonNull(config, ClickHouseDataConfig.TYPE_NAME).getFormat();
        ClickHouseDataProcessor processor = null;
        if (ClickHouseFormat.RowBinary == format || ClickHouseFormat.RowBinaryWithNamesAndTypes == format) {
            processor = new ClickHouseRowBinaryProcessor(config, input, output, columns, settings);
        } else if (format.isText()) {
            processor = new ClickHouseTabSeparatedProcessor(config, input, output, columns, settings);
        }
        return processor;
    }

It looks like processor is significant for the read use cases but for load use cases, it might not be and hence we can probably introduce null check in close method of com.clickhouse.client.ClickHouseStreamResponse

Note : This issue seems to be introduced starting version 0.4.5 as it works fine till 0.4.4

porechajp avatar Dec 03 '23 08:12 porechajp

I have encountered the same behavior with clickhouse-jdbc-0.5.0.jar. The rows are inserted but a null pointer exception occurs at the end of the query.

lee170 avatar Apr 09 '24 19:04 lee170