greptimedb Bulk load for greptimedb

Bulk load data from sources, such as:

csv file
json file
parquet file
other tables
mysql table
....

Nov 07 '22 00:11 killme2008

I've invested bulk loading parquet files last week. As parquet is our (and the only) native supported format, we only need to supply some manifest and our specific metadata (in persist storage and in meta server) to make parquet files query-able and even writable.

But what about other format like csv or json? They cannot be directly queried (for now). Two approaches I come up with is

an offline converter that converts other format into parquet, and ingest the converted parquet file.
add support for those formats.

Nov 07 '22 06:11 waynexia

make parquet files query-able and even writable.

And in a cluster we should have to split the file according to the table's partition rule as well? This is better done in frontend via some custom sql like COPY INTO

And let frontend to deal with more formats like csv or json. We can convert them to parquet internally.

Nov 10 '22 16:11 sunng87

And in a cluster we should have to split the file according to the table's partition rule as well?

Yes. We can let frontend preprocess(split) it and upload them all to OSS.

And let frontend to deal with more formats like csv or json. We can convert them to parquet internally.

I also prefer to convert other formats to parquet. Though support them is not complex but considering the possible modification in the future it would be better to unify the format.

Nov 11 '22 04:11 waynexia

Already implemented in #1038 #1064

May 08 '23 07:05 killme2008