icedb icon indicating copy to clipboard operation
icedb copied to clipboard

An in-process Parquet merge engine for better data warehousing in S3

Results 32 icedb issues
Sort by recently updated
recently updated
newest added

Have a mode where it operates off disk instead of S3 in case of using something like AWS fsx lustre. That might provide better managed performance, especially for reading the...

enhancement

Use latest version, and use latest credentials methods for S3

enhancement

Can insert an avro record, which since it includes the schema, will use that schema for the insert. It will be used both for setting initial table types, as well...

documentation
enhancement

Compare ingesting and queries on the same data set to: - [ ] BigQuery - [ ] Athena - [ ] ClickHouse - [ ] MotherDuck - [ ] Parquet...

documentation
help wanted

90+% of the download time is waiting for time to first byte (avg 86ms in region on AWS). If we do batches of these and buffer in memory it should...

enhancement

Github events has 232M rows and lots of example queries: https://ghe.clickhouse.tech/ https://clickhouse.com/docs/en/getting-started/example-datasets/nyc-taxi has 3B rows but is smaller in size can do this too and much less complex schema Should...

documentation

When performing partition rewrite, we should be able to specify how many files will be processed concurrently.

enhancement
help wanted

Show how to do partition removal. Example of TTL, and example of user ID deletion

help wanted
example

Make example showing how to filter out a given user ID's data Related to #106

help wanted
example

According to https://github.com/apache/arrow-datafusion-python/issues/442#issuecomment-1685809731 it should be able to do this with https://github.com/danthegoodman1/IceDBS3Proxy

help wanted
example