icedb
icedb copied to clipboard
An in-process Parquet merge engine for better data warehousing in S3
Have a mode where it operates off disk instead of S3 in case of using something like AWS fsx lustre. That might provide better managed performance, especially for reading the...
Can insert an avro record, which since it includes the schema, will use that schema for the insert. It will be used both for setting initial table types, as well...
Compare ingesting and queries on the same data set to: - [ ] BigQuery - [ ] Athena - [ ] ClickHouse - [ ] MotherDuck - [ ] Parquet...
90+% of the download time is waiting for time to first byte (avg 86ms in region on AWS). If we do batches of these and buffer in memory it should...
Github events has 232M rows and lots of example queries: https://ghe.clickhouse.tech/ https://clickhouse.com/docs/en/getting-started/example-datasets/nyc-taxi has 3B rows but is smaller in size can do this too and much less complex schema Should...
When performing partition rewrite, we should be able to specify how many files will be processed concurrently.
Show how to do partition removal. Example of TTL, and example of user ID deletion
Make example showing how to filter out a given user ID's data Related to #106
According to https://github.com/apache/arrow-datafusion-python/issues/442#issuecomment-1685809731 it should be able to do this with https://github.com/danthegoodman1/IceDBS3Proxy