databend
databend copied to clipboard
Improvement: abandon internal patches of parquet2
Summary
We have two internal patches of parqeut2, which mainly address the requirement
- acquire the parquet file meta, right after the parquet file has been written, without re-read the file
It works, but awkwardly: each time we sync with upstream(official parquet2), there are some extra works to do (rebase, resolve potential conflicts...)
Among the new features that parquet2 has introduced recently, the following two seem to be able to resolve the above requirement.
- https://github.com/jorgecarleitao/parquet2/pull/147
- https://github.com/jorgecarleitao/parquet2/pull/148
Thus,
- we should replace our own internal patches using the new APIs that parquet2 exposes.
- and pin the
parquet2
cargo dependency to the rev of the official parquet2 commit
Let's go upstream first!
internal parquet2 patches are not totally abandoned yet ( for data format backward compatibility). after all the old data has been migrated, we should switch to the upstream parquet2,
It's time to remove https://github.com/datafuselabs/parquet2 and https://github.com/datafuse-extras/parquet2 ?