databend icon indicating copy to clipboard operation
databend copied to clipboard

Tracking issues of `upstream first` violation

Open Xuanwo opened this issue 2 years ago • 15 comments

Databend will adopt Upstream First rules across the whole project, which means:

  • We will try our best to keep our dependences in the latest stable version.
  • We will try our best to report issues we met upstream.
  • We will try our best to contribute back our own changes.

In this tracking issue, we will record the temporary forks that we are using and our plan for them:

  • sqlparser = { git = "https://github.com/datafuse-extras/sqlparser-rs", rev = "7f246e3" }
  • sled = { git = "https://github.com/datafuse-extras/sled", tag = "v0.34.7-datafuse.1", default-features = false }
  • cargo-license = { git = "https://github.com/datafuse-extras/cargo-license", rev = "f1ce4a2" }
    • [ ] https://github.com/datafuselabs/databend/issues/6969
  • parquet2 = { version = "0.14.1", optional = true, git = "https://github.com/datafuse-extras/parquet2", rev = "3a468fc3c4" }
    • [ ] https://github.com/datafuselabs/databend/issues/6064
  • async-trait = { git = "https://github.com/datafuse-extras/async-trait", rev = "f0b0fd5" }
    • [ ] https://github.com/datafuselabs/databend/issues/6968

Already cleaned

  • bincode
  • chrono: https://github.com/datafuselabs/databend/issues/6928
  • clickhouse-driver: https://github.com/datafuselabs/databend/issues/6967

Xuanwo avatar Aug 01 '22 02:08 Xuanwo

cc @andylokandy @leiysky, maybe we can remove sqlparser after the new planner is ready?

Xuanwo avatar Aug 01 '22 02:08 Xuanwo

cc @drmingdrmer @lichuang, will we maintain sled as a new upstream? Or we will use openkv to replace sled finally?

Xuanwo avatar Aug 01 '22 02:08 Xuanwo

cc @PsiACE, why we need to use a fork of cargo-license, async-trait, clickhouse-driver and chrono?

Xuanwo avatar Aug 01 '22 02:08 Xuanwo

cc @dantengsky, can we migrate parquet2 to upstream instead of our own fork?

Xuanwo avatar Aug 01 '22 02:08 Xuanwo

cc @drmingdrmer @lichuang, will we maintain sled as a new upstream? Or we will use openkv to replace sled finally?

To me. I'm not sure about this. sled has not been actively maintained for a while. Its author recently switched to another LSM storage project. The extra/sled is mainly for easily adding a tag for databend use.

drmingdrmer avatar Aug 01 '22 02:08 drmingdrmer

To me. I'm not sure about this. sled has not been actively maintained for a while. Its author recently switched to another LSM storage project.

We need a decision here. Possible option are as follows:

  • Fork sled (like openraft from async-raft) and act as a new upstream, release as sled-databend (maybe)
  • Switch to other active k-v storage projects (I don't know, maybe rocksdb?)
  • Replace sled with openkv (A roadmap is enough for now)

Xuanwo avatar Aug 01 '22 02:08 Xuanwo

cc @dantengsky, can we migrate parquet2 to upstream instead of our own fork?

There is only one thing that prevents us from using the vanilla upstream parquet2: there might be some legacy data that is compressed by the non-standardard lz4.
let me try to make sure that no such data is left, and then switch to upstream parquet2.

dantengsky avatar Aug 01 '22 03:08 dantengsky

let me try to make sure that no such data is left, and then switch to upstream parquet2.

Looks nice!

Please create an issue for it and link to this issue. Thanks!

Xuanwo avatar Aug 01 '22 03:08 Xuanwo

To me. I'm not sure about this. sled has not been actively maintained for a while. Its author recently switched to another LSM storage project.

We need a decision here. Possible option are as follows:

  • Fork sled (like openraft from async-raft) and act as a new upstream, release as sled-databend (maybe)
  • Switch to other active k-v storage projects
  • Replace sled with openkv (A roadmap is enough for now)

datafuse-extras/sled is already a fork. Do we have to release it as a standalone crate?

drmingdrmer avatar Aug 01 '22 03:08 drmingdrmer

Do we have to release it as a standalone crate?

Release it as a new crate looks better to me. Like tikv does for jemalloctor: https://github.com/tikv/jemallocator

So people in the rust community can benefit from our fork too.

And if we want to maintain sled, we need to set up the lowest bar like CI must pass, changes must with tests, etc.

Xuanwo avatar Aug 01 '22 03:08 Xuanwo

Do we have to release it as a standalone crate?

Release it as a new crate looks better to me. Like tikv does for jemalloctor: https://github.com/tikv/jemallocator

So people in the rust community can benefit from our fork too.

And if we want to maintain sled, we need to set up the lowest bar like CI must pass, changes must with tests, etc.

There is no plan to introduce any improvement to sled yet:(. The global singleton model in sled is not easy to use or test. And the transaction model is also half mature.

drmingdrmer avatar Aug 01 '22 03:08 drmingdrmer

why we need to use a fork of cargo-license, async-trait, clickhouse-driver and chrono?

  • cargo-license: It's not as aggressively maintained, of course, it is perfectly possible to use specific commits from upstream. I will deal with it, the only problem is that there is no new version of the cargo-license released, I will consider communicating with its maintainer about this.
  • async-trait: We forked a fork that someone else was maintaining and assync-trait refused to accept the change, perhaps consider suggesting that the upstream maintainer release a new crate.
  • chrono Good news, chrono has some new maintainers and we can use 0.4.20-rc.1
  • clickhouse-driver It's not as aggressively maintained, I will move it to opensrv later to maintain it and release a new crate.

PsiACE avatar Aug 01 '22 04:08 PsiACE

As for sled, if we still plan to use it, perhaps we can consider refactoring it to meet our needs. If we decide to abandon it, emm, for the time being, the Rust ecosystem may lack a really strong alternative.

I am not opposed to using rocksdb or implementing openkv, nor am I opposed to looking into the availability of redb/persy.

PsiACE avatar Aug 01 '22 04:08 PsiACE

  • chrono Good news, chrono has some new maintainers and we can use 0.4.20-rc.1

Great, it's nice to know chrono come back again!

Xuanwo avatar Aug 01 '22 04:08 Xuanwo

Awesome practice. How about give a lecture!

aseaday avatar Aug 02 '22 03:08 aseaday

Done.

BohuTANG avatar Sep 06 '22 01:09 BohuTANG

Done.

Not yet. We must address sqlparser and sled, which are still going on.

Xuanwo avatar Sep 06 '22 01:09 Xuanwo

@PsiACE do you know why we need to keep our own arrow-format?

Xuanwo avatar Nov 10 '22 15:11 Xuanwo

@PsiACE do you know why we need to keep our own arrow-format?

We need support for higher versions of tonic and prost.

PsiACE avatar Nov 11 '22 02:11 PsiACE

We need support for higher versions of tonic and prost.

Ok, I got it. Related to https://github.com/DataEngineeringLabs/arrow-format/pull/5

Xuanwo avatar Nov 11 '22 04:11 Xuanwo

@PsiACE do you know why we need to keep our own arrow-format?

This has been addressed by https://github.com/datafuselabs/databend/pull/8785.

Thanks @GPSnoopy @jorgecarleitao

Xuanwo avatar Nov 14 '22 06:11 Xuanwo