databend icon indicating copy to clipboard operation
databend copied to clipboard

feat: add object cache for stage parquet file

Open bohutang opened this issue 2 years ago • 5 comments

Summary

Now, the Databend object cache only supports the FUSE engine table. This query cannot be accelerated:

select * from 's3://aa/bb/cc/' (pattern => '.*.parquet')

However, we can enable it for the stage files with the parquet format.

bohutang avatar Sep 09 '23 02:09 bohutang

Since you are working on the new parquet crate, cc @youngsofun @RinChanNOWWW

bohutang avatar Sep 09 '23 02:09 bohutang

This is a low prioirty issue.

The high priority is finding a way to cache the metadata of the stage parquet (in memory?), similar to the approach used in DuckDB as discussed here: In-memory cache of Parquet data? Persistently cache Parquet metadata

bohutang avatar Sep 09 '23 02:09 bohutang

It looks we have it already: FileMetaDataCache, but not support stage parquet file.

bohutang avatar Sep 09 '23 04:09 bohutang

/assgin me

zenus avatar Nov 03 '23 12:11 zenus

Hi @zenus ,

I apologize for any confusion caused by this issue. It's important to note that we should avoid caching for stage files. Attempting to control the caching area could lead to significant issues, particularly with some write operations.

bohutang avatar Nov 25 '23 06:11 bohutang