Checklist for v0.3.0 release
A couple of things we need for a v0.3.0 release:
- [x] Upgrade duckdb to v1.3.0
- [ ] Duckdb moves most of the HTTP related code from httpfs extension to main repo, check whether we keep httpfs as a third-party library (checked in via git submodule), or directly use
HttpClient - [x] Revive filesystem wrapping logic, the required upstream changes have been released in v1.3.0
- [x] Update
Globimplementation and metadata cache, which has been changed upstream - [ ] Check whether we could leverage
FileOpenInfo, which is returned from glob operation
References:
- Duckdb puts HTTP into main repo: https://github.com/duckdb/duckdb/pull/17464
- Feature request for filesystem wrapping: https://github.com/dentiny/duck-read-cache-fs/issues/190
- Feature request to update
Glob: https://github.com/dentiny/duck-read-cache-fs/issues/192
Thanks for the summary @dentiny. Are you planning to align your extension with External File Cache (https://github.com/duckdb/duckdb/pull/16463)?
Thanks for the summary @dentiny. Are you planning to align your extension with External File Cache (duckdb/duckdb#16463)?
Thank you @serge-melnik for the interest! I'm aware of the duckdb cache, and I think there're a few difference between the extension and built-in cache.
The built-in cache takes different caching strategy, for example, it's request-based instead of block-based. I'm not convinced request-based is the best solution, for example, guessing parquet footer might lead to request overlap: https://github.com/duckdb/duckdb/pull/17300
As for integration, I'm considering to expose another feature flag for users prefer "external file cache", but still wants other features like stats collection. But no concrete implementation for now.