databend
databend copied to clipboard
Tracking issues of `upstream first` violation
Databend will adopt Upstream First
rules across the whole project, which means:
- We will try our best to keep our dependences in the latest stable version.
- We will try our best to report issues we met upstream.
- We will try our best to contribute back our own changes.
In this tracking issue, we will record the temporary forks that we are using and our plan for them:
-
sqlparser = { git = "https://github.com/datafuse-extras/sqlparser-rs", rev = "7f246e3" }
-
sled = { git = "https://github.com/datafuse-extras/sled", tag = "v0.34.7-datafuse.1", default-features = false }
-
cargo-license = { git = "https://github.com/datafuse-extras/cargo-license", rev = "f1ce4a2" }
- [ ] https://github.com/datafuselabs/databend/issues/6969
-
parquet2 = { version = "0.14.1", optional = true, git = "https://github.com/datafuse-extras/parquet2", rev = "3a468fc3c4" }
- [ ] https://github.com/datafuselabs/databend/issues/6064
-
async-trait = { git = "https://github.com/datafuse-extras/async-trait", rev = "f0b0fd5" }
- [ ] https://github.com/datafuselabs/databend/issues/6968
Already cleaned
-
bincode
-
chrono
: https://github.com/datafuselabs/databend/issues/6928 -
clickhouse-driver
: https://github.com/datafuselabs/databend/issues/6967
cc @andylokandy @leiysky, maybe we can remove sqlparser
after the new planner is ready?
cc @drmingdrmer @lichuang, will we maintain sled as a new upstream? Or we will use openkv
to replace sled
finally?
cc @PsiACE, why we need to use a fork of cargo-license
, async-trait
, clickhouse-driver
and chrono
?
cc @dantengsky, can we migrate parquet2
to upstream instead of our own fork?
cc @drmingdrmer @lichuang, will we maintain sled as a new upstream? Or we will use
openkv
to replacesled
finally?
To me. I'm not sure about this. sled
has not been actively maintained for a while. Its author recently switched to another LSM storage project.
The extra/sled is mainly for easily adding a tag for databend use.
To me. I'm not sure about this.
sled
has not been actively maintained for a while. Its author recently switched to another LSM storage project.
We need a decision here. Possible option are as follows:
- Fork
sled
(like openraft from async-raft) and act as a new upstream, release assled-databend
(maybe) - Switch to other active k-v storage projects (I don't know, maybe rocksdb?)
- Replace
sled
withopenkv
(A roadmap is enough for now)
cc @dantengsky, can we migrate
parquet2
to upstream instead of our own fork?
There is only one thing that prevents us from using the vanilla upstream parquet2: there might be some legacy data that is compressed by the non-standardard lz4.
let me try to make sure that no such data is left, and then switch to upstream parquet2.
let me try to make sure that no such data is left, and then switch to upstream parquet2.
Looks nice!
Please create an issue for it and link to this issue. Thanks!
To me. I'm not sure about this.
sled
has not been actively maintained for a while. Its author recently switched to another LSM storage project.We need a decision here. Possible option are as follows:
- Fork
sled
(like openraft from async-raft) and act as a new upstream, release assled-databend
(maybe)- Switch to other active k-v storage projects
- Replace
sled
withopenkv
(A roadmap is enough for now)
datafuse-extras/sled is already a fork. Do we have to release it as a standalone crate?
Do we have to release it as a standalone crate?
Release it as a new crate looks better to me. Like tikv does for jemalloctor
: https://github.com/tikv/jemallocator
So people in the rust community can benefit from our fork too.
And if we want to maintain sled
, we need to set up the lowest bar like CI must pass, changes must with tests, etc.
Do we have to release it as a standalone crate?
Release it as a new crate looks better to me. Like tikv does for
jemalloctor
: https://github.com/tikv/jemallocatorSo people in the rust community can benefit from our fork too.
And if we want to maintain
sled
, we need to set up the lowest bar like CI must pass, changes must with tests, etc.
There is no plan to introduce any improvement to sled
yet:(.
The global singleton model in sled
is not easy to use or test.
And the transaction model is also half mature.
why we need to use a fork of
cargo-license
,async-trait
,clickhouse-driver
andchrono
?
-
cargo-license
: It's not as aggressively maintained, of course, it is perfectly possible to use specific commits from upstream. I will deal with it, the only problem is that there is no new version of the cargo-license released, I will consider communicating with its maintainer about this. -
async-trait
: We forked a fork that someone else was maintaining and assync-trait refused to accept the change, perhaps consider suggesting that the upstream maintainer release a new crate. -
chrono
Good news, chrono has some new maintainers and we can use 0.4.20-rc.1 -
clickhouse-driver
It's not as aggressively maintained, I will move it to opensrv later to maintain it and release a new crate.
As for sled, if we still plan to use it, perhaps we can consider refactoring it to meet our needs. If we decide to abandon it, emm, for the time being, the Rust ecosystem may lack a really strong alternative.
I am not opposed to using rocksdb or implementing openkv, nor am I opposed to looking into the availability of redb/persy.
chrono
Good news, chrono has some new maintainers and we can use 0.4.20-rc.1
Great, it's nice to know chrono
come back again!
Awesome practice. How about give a lecture!
Done.
Done.
Not yet. We must address sqlparser
and sled
, which are still going on.
@PsiACE do you know why we need to keep our own arrow-format
?
@PsiACE do you know why we need to keep our own
arrow-format
?
We need support for higher versions of tonic and prost.
We need support for higher versions of tonic and prost.
Ok, I got it. Related to https://github.com/DataEngineeringLabs/arrow-format/pull/5
@PsiACE do you know why we need to keep our own
arrow-format
?
This has been addressed by https://github.com/datafuselabs/databend/pull/8785.
Thanks @GPSnoopy @jorgecarleitao