skywalking icon indicating copy to clipboard operation
skywalking copied to clipboard

[Feature] [BanyanDB] Cross DataCenter/Cluster Data Synchronism

Open wu-sheng opened this issue 3 years ago • 6 comments
trafficstars

Search before asking

  • [X] I had searched in the issues and found no similar feature requirement.

Description

Inside one cluster, data synchronism is replication, which has been mentioned in https://github.com/apache/skywalking/issues/8501. Here, I want to highlight we need the Cross DataCenter/Cluster Data Synchronism to make telemetry data could be shipped in different data centers. Such as in the target cluster, OAP works with a BanyanDB server(a) for analysis and monitoring, and another BanyanDB server could stand in another cluster accepting data synced from (a) server, and providing read-only query for the OPS team.

Use case

No response

Related issues

No response

Are you willing to submit a PR?

  • [ ] Yes I am willing to submit a PR!

Code of Conduct

wu-sheng avatar Feb 06 '22 09:02 wu-sheng

Some follow up notes.

Unidirectional with NATS and crdt:

  • https://github.com/simpleiot/simpleiot/blob/c92937dd5754eed5769c9075025b850cbc341e9b/docs/user/faq.md#q-cant-nats-jetstream-do-everything-siot-does

NATS Jet stream embedding:

  • simple example to do it:
    • : https://github.com/simpleiot/simpleiot/blob/master/server/nats-server.go
  • should make it so you can run with or without nats server embedded:
    • kmm serve --nats.embed
    • https://www.byronruth.com/implementing-an-event-store-on-nats-part-2/

Time smearing

  • so that ordering is not the only way of convergence.
  • google does it for all hosted systems automatically. We need our own so that people can deploy anywhere.

gedw99 avatar Jan 30 '23 15:01 gedw99

I have a great talk with @gedw99, who propose leveraging NATS as the connectivity layer on the cross-cluster replication. The basic ideas are:

  1. Embed NATS server into BanyanDB server as the embedded EtcD did.
  2. Introduce new APIs to configure and observe the replication process at runtime.
  3. Build the replication on the WAL replication logs. Before the WAL is available, direct writing will be in place.

Next, I will work with @gedw99 to draft a design to provide more details.

hanahmily avatar Jan 30 '23 15:01 hanahmily

Was awesome for me too @hanahmily

Looking forward to working on this with you and the team.

gedw99 avatar Jan 30 '23 15:01 gedw99

@gedw99 After some investigation on the time system, "Leap second" is not a problem to banyandb:

  • The client writes the timestamp. BanyanDB is okay with writing data points in the future or the past.
  • The replication procedure will rely on the timestamp from clients. Most writings are idempotent, and ordering is not critical to BanyanDB's data model.

hanahmily avatar Feb 05 '23 05:02 hanahmily

Ok got it . Good to know as it’s simpler

gedw99 avatar Feb 05 '23 10:02 gedw99

@gedw99 The draft is here, https://docs.google.com/document/d/1qQn1u6VZUbxEWnflSCfpowlXWYqwK25pYY7MuCvoaq4/edit?usp=sharing. Could you take a look at it? If you have any questions, please left comments on it.

I know you have a better knowledge of NATS. Could you help with the Security and Observability sections?

hanahmily avatar Feb 12 '23 07:02 hanahmily