skywalking-swck
skywalking-swck copied to clipboard
[Feature] Add BanyanDB CRD & Controllers
Original Issue
https://github.com/apache/skywalking/issues/9396
What changes brought in this PR
In this PR, I add CRD for BanyanDB.
- Use StatefulSet as BanyanDB's underlying controller, so that we can benefit from its stable network identifiers or storage, plus future extensions.
- Use Service as BanyanDB's networking identifier for internal access & communication.
- Use LocalPV as BanyanDB's default storage choice.
I can't see an e2e to verify this. SWCK should be verified whether it really could deploy.
@wu-sheng @lujiajing1126 This is a draft pr for the osapp project. The student @SzyWilliam should still be coding, but I'm not familiar with some features of Banyandb, so could you please confirm whether the Spec field can meet all the current banyandb requirements?
Then this PR should be labeled aa draft or use issue or discussion panel to talk. PR could trigger mail notifications for reviewers.
BTW, I think you ping a wrong people.
@wu-sheng @lujiajing1126 This is a draft pr for the osapp project. The student @SzyWilliam should still be coding, but I'm not familiar with some features of Banyandb, so could you please confirm whether the Spec field can meet all the current banyandb requirements?
As I understand, since the current version of BanyanDB only has a single node, Deployment can meet our requirement.
The operator for VictoriaMetrics
and its VMSingle
CRD can be a good reference for us.
As I understand, since the current version of BanyanDB only has a single node, Deployment can meet our requirement. @lujiajing1126 Thanks for your reply! If we consider future extensions, a StatefulSet will be great for a BanyanDB cluster since it provides bindings on pv & network identifier. Why don't we use StatefulSet now?
@lujiajing1126 Thanks for your reply! If we consider future extensions, a StatefulSet will be great for a BanyanDB cluster since it provides bindings on pv & network identifier. Why don't we use StatefulSet now?
Actually, I'm neutral about the choice. I would just say Deployment
is definitely much simpler and thus has lower cost. The cluster version is on our roadmap but with very few details. It is not possible for us now to cover all what we need in the future.
Deployment
is better here. The cluster mode will know each node's roles, we don't leverage statefulset to get stable and ordered network endpoints and pv.
@dashanji @hanahmily @lujiajing1126 Hi, I've completed a minimal working BanyanDB CRD&Controller for SWCK. It passes e2e tests. Please take a look at this, thanks! There are still some advanced features to be added, like TLS config for banyandb. I plan to add these features in future PRs.
What is the storage volume mount policy?
@hanahmily When we use this on the demo deployment, we need to make sure storage volume could mounted back when pod reboot(this happens time to time on our demo GCP.
What is the storage volume mount policy?
@hanahmily When we use this on the demo deployment, we need to make sure storage volume could mounted back when pod reboot(this happens time to time on our demo GCP.
At present, we provide the PVC API for users. If users create a Persistent Volume, the PVC will find the PV and the data will be stored there.
@dashanji Thanks for the reviews. @hanahmily I am trying to write e2e tests for swck banyandb. Could you please tell me how I can query banyandb pod using client tools like cli?
@dashanji Thanks for the reviews. @hanahmily I am trying to write e2e tests for swck banyandb. Could you please tell me how I can query banyandb pod using client tools like cli?
bydbctl is WIP. you could query data from OAP instead of the banyandb server as https://github.com/apache/skywalking-banyandb/tree/main/test/e2e-v2 .
@SzyWilliam LGTM, thanks for your patience and hard work! Hope you will continue to improve it in the next PRs.
@dashanji Thanks for your reviews! I've completed the full banyandb crd & controller implementation with corresponding e2e tests. @wu-sheng @hanahmily @lujiajing1126 Please take a look at it~🫰
@hanahmily Thanks for the reviews! I've exposed HTTP port and added an ingress resource to route HTTP requests, PTAL. I have a question: according to banyandb github page, we do not support http server for now. Does banyandb support HTTP server now? If so, how I can query data using HTTP?
we do not support http server for now. Does banyandb support HTTP server now?
We supported the HTTP server in the main branch. It will be released in v0.2.0.
If so, how I can query data using HTTP?
bydbctl will take it. The primitive commands will be released in v0.2.0 with the HTTP server.
And you could access the HTTP endpoint directly. The API specifications are at https://github.com/apache/skywalking-banyandb/blob/main/api/proto/openapi
@hanahmily Thanks for the reply and providing framework on Service. I'll use separate gRPC Service and HTTP service. I plan to use bydbctl to verify banyandb data storage in E2E Tests after 0.2.0 release, so I'll just add a todo for now.
I've made changes on code:
- separate grpc & http into two service resources, expose 17912/17913 respectively.
- remove unused ports 2121 6060. @hanahmily Please take a look at it again, thanks!
It is great to see this PR is getting ready after months of work. Congrats.
@lujiajing1126 We need your confirmation here. Please recheck.
@lujiajing1126 Thanks for the careful review! I've made changes on code, please take a look again.