clickhouse-java icon indicating copy to clipboard operation
clickhouse-java copied to clipboard

Importing csv file into table

Open kiwimg opened this issue 1 year ago • 5 comments

How to import CSV files into a table?

v0.3.2-patch10,use “ com.clickhouse.jdbc.ClickHouseStatement” Is this correct? clickHouseStatement.write() // Write API entrypoint .table(endPiont).format(ClickHouseFormat.CSV)// where to write data .data(file.getAbsolutePath(), ClickHouseCompression.ZIP)/// specify input .send();

ru.yandex.clickhouse.* will Deprecated

Before 0.3.2... import ru.yandex.clickhouse.ClickHouseStatement; ClickHouseStatement sth = connection.createStatement(); sth .write() // Write API entrypoint .table("default.my_table") // where to write data .option("format_csv_delimiter", ";") // specific param .data(new File("/path/to/file.csv.gz"), ClickHouseFormat.CSV, ClickHouseCompression.gzip) // specify input .send();

kiwimg avatar Jul 16 '22 03:07 kiwimg

How to import large CSV files quickly, What is a good implementation method, thanks

kiwimg avatar Jul 18 '22 04:07 kiwimg

How to import large CSV files quickly, What is a good implementation method, thanks

Sorry for the late reply. ClickHouseFile was added for this but it's only half baked. I think I can quickly fix that when I get home. You'll be able to use below code starting from patch11:

// one-liner
ClickHouseClient.load(
    ClickHouseNode.of("http://localhost:8123/system"),
    "table_a", 
    // will parse file name later so that you don't have to specify compression and format
    ClickHouseFile.of("/Users/zhicwu/a.csv.gz", ClickHouseCompression.GZIP, 0, ClickHouseFormat.CSV)).get();

// JDBC
Statement stmt = conn.createStatement();
stmt.unwrap(ClickHouseRequest).write().data(ClickHouseFile.of(...)).table("mytable").executeAndWait();

zhicwu avatar Jul 18 '22 07:07 zhicwu

Very good!thanks!!

Can GZ files be automatically compressed? For example, the CSV source files I import are automatically compressed by the interface patch11: When will it be released? I expect the import speed to be fast and the memory consumption to be smal

kiwimg avatar Jul 18 '22 07:07 kiwimg

@kiwimg, since patch11 has been released, you may refer to this for direct file loading, which should be similar to use curl. If you want Java client to take care of compression when uploading file, you just need to set input stream and compression algorithm in ClickHouseRequest.

I'm still thinking a more consistent way of doing this so I changed milestone to 0.3.3.

zhicwu avatar Aug 02 '22 01:08 zhicwu

@kiwimg, since patch11 has been released, you may refer to this for direct file loading, which should be similar to use curl. If you want Java client to take care of compression when uploading file, you just need to set input stream and compression algorithm in ClickHouseRequest.

I'm still thinking a more consistent way of doing this so I changed milestone to 0.3.3.

Thank you very much. You're really great

kiwimg avatar Aug 02 '22 11:08 kiwimg

what's difference between the so called "direct file loading" with the java client ClickhouseClient#load(ClickHouseNode server, String table, ClickHouseFormat format, ClickHouseCompression compression, String file) method @zhicwu

chigend avatar Dec 30 '22 11:12 chigend

The API was enhanced for scheme inferring although it's limited to a few file extensions. For example, passing file name a.csv.lz4 will be treated as LZ4 compressed file in CSV format.

@chigend, sorry for the late reply. To help you understand, let's start with an example. Assume you want to load a.csv.lz4 into ClickHouse. It's ridiculous to uncompress the file first and then load the CSV into ClickHouse, right? It's more direct to load the compressed file directly into the database because ClickHouse simply supports that :)

zhicwu avatar Jan 19 '23 12:01 zhicwu

thank you @zhicwu

chigend avatar Jan 30 '23 06:01 chigend