etcd icon indicating copy to clipboard operation
etcd copied to clipboard

Develop a caching library for etcd

Open serathius opened this issue 10 months ago • 38 comments

Submitted as project as part of Google Summer of Code with @MadhavJivrajani as second mentor.

While etcd is a powerful distributed key-value store, building scalable infrastructure management systems directly on top of it can be challenging. Kubernetes has demonstrated the effectiveness of the reconciliation pattern for managing complex deployments, and its watch cache plays a crucial role in achieving scalability. However, this crucial caching mechanism is tightly coupled with Kubernetes and not readily available for general etcd usage. Projects like Cilium and Calico Typha, while successfully using etcd for control planes, have had to implement custom solutions to address this gap.

This project addresses the need for a standardized, performant caching solution for etcd, enabling easier adoption of the reconciliation pattern and simplifying the development of scalable etcd-based systems. By providing a generic watch cache implementation, we aim to lower the barrier to entry for building robust and efficient infrastructure management tools on etcd.

Goals:

  • Develop a generic proxy that provides feature parity to K8s watch cache
  • Enable possibility of integrating into K8s and Cilium.

Milestones:

  • Cache for watch requests, stores history of watch events and demultiplexes requests. Combining multiple local Watch requests into single watch to etcd.
  • Cache for non-consistent list requests, stores latest state of etcd in Btree cache. Cache is fed by Range response, that is later updated by subscribing to updates from Watch Cache.
  • Handling requests during cache initialization and re-intialization.
  • Testing, including e2e and robustness tests
  • Metrics for cache size, cache latency etc
  • Benchmarks for watch and read throughput.
  • Support for custom encoder/decoder
  • Support for custom indexing
  • Support for consistent reads
  • Support for exact stale reads, by storing snapshots of btree.

I'm proposing to locate the project within the etcd mono repo, but as a separate package, that will not be released/tagged until it's ready. Proposed package name: go.etcd.io/cache. Client library would be developed under go.etcd.io/cache/client.

/cc @fuweid @MadhavJivrajani @ahrtr @henrybear327

serathius avatar Feb 10 '25 10:02 serathius

High level I agree with the improvement & direction, as performance should be one of the key areas that we should spend more effort on. It will definitely ensure the long-term success of etcd.

ahrtr avatar Feb 10 '25 10:02 ahrtr

We need to develop a generic cache for etcd, that allows users to easily addopt multi layered caching architecture similar to K8s. Having an official library would allow us to properly test it ensuring it's correctness and performance.

Sounds great. It could be more efficient to make and evaluate changes as an official library. +1 for help, if need.

fuweid avatar Feb 10 '25 19:02 fuweid

cc @ahrtr @ivanvc any preference where development should happen. My proposal:

I'm proposing to locate the project within the etcd mono repo, but as a separate package, that will not be released/tagged until it's ready. Proposed package name: go.etcd.io/etcd/cache. Client library would be developed under go.etcd.io/etcd/cache/client.

serathius avatar Feb 10 '25 19:02 serathius

I'm proposing to locate the project within the etcd mono repo, but as a separate package

It should be OK.

go.etcd.io/cache/client

I think all packages in the etcd mono repo should have the same prefix go.etcd.io/etcd/. Also is the cache dedicated for the watch scenario, or potentially be for other cases as well? Could you provide more context or details before we make any detailed decision?

ahrtr avatar Feb 10 '25 20:02 ahrtr

I think all packages in the etcd mono repo should have the same prefix go.etcd.io/etcd/

Ok, don't think it should be a problem.

I expect that on top level of hierarchy we will want client cache, and standalone cache server (like a grpc proxy but based on new cache library with configurable caching, covering all Range types and with proper guarantees). Within the client cache we will have separate watch de-multiplexer and cache for range requests.

serathius avatar Feb 10 '25 20:02 serathius

I expect that on top level of hierarchy we will want client cache, and standalone cache server (like a grpc proxy but based on new cache library with configurable caching, covering all Range types and with proper guarantees). Within the client cache we will have separate watch de-multiplexer and cache for range requests.

Can we have a spec & design doc for these?

ahrtr avatar Feb 10 '25 21:02 ahrtr

No, I was just providing more context. Using go.etcd.io/etcd/cache package should be ok.

serathius avatar Feb 12 '25 08:02 serathius

This sounds exciting, and I’d love to take it up as part of Google Summer of Code. The idea of a standardized caching solution for etcd is impactful and I'd love to implement this as my project.

I'm currently exploring how we've implemented caching in k8s and I look forward to mirroring something similar for etcd in this project. Looking forward to the opportunity to contribute and collaborate with everyone on this.

abdurrehman107 avatar Feb 27 '25 13:02 abdurrehman107

Hi, I have a question!

It seems like the b-tree structure in api-server has recently been introduced. Can I ask what's been encouraging the community to strengthen the etcd caching logic?

My intention is to know if there were specific team goals behind the recent activities on api-server and this proposal :)

mutokrm avatar Feb 28 '25 03:02 mutokrm

@mutokrm the main motivations can be found here in this KEP:https://github.com/kubernetes/enhancements/tree/master/keps/sig-api-machinery/4988-snapshottable-api-server-cache

MadhavJivrajani avatar Mar 06 '25 00:03 MadhavJivrajani

For folks following along here, here's a few pointers to take a look at to gain some context:

  • https://sched.co/1R2wD
  • https://youtu.be/PLSDvFjR9HY?feature=shared
  • https://github.com/kubernetes/enhancements/tree/master/keps/sig-api-machinery/4988-snapshottable-api-server-cache
  • https://github.com/kubernetes/enhancements/tree/master/keps/sig-api-machinery/2340-Consistent-reads-from-cache

MadhavJivrajani avatar Mar 06 '25 00:03 MadhavJivrajani

Hello Everyone, My name is Burhanuddin and I am a CS undergrad. I am looking forward to contribute to etcd. To get started, I have learned a bit about Kubernetes upto the level of building CRDs. I have also started learning Go programming language since I am new to it. What I am doing next is:

  1. do a quickstart tutorial to understand the etcd project better
  2. follow developer guide from here: https://etcd.io/docs/v3.5/

What I need help with is @serathius @MadhavJivrajani

  1. I can't find any contributor guide in the docs to be able to set up etcd in my local environment and test changes locally.
  2. since this project seems more involved in terms of complexity (which is my motivation for contributing to this repo), I will appreciate if you can guide me on what the development setup for the project would look like, what parts of the project will be more involved.
  3. I have a lot of questions around the project idea but I will defer them till I can get a working setup locally.

So I will start with the docs and setting up project locally and see what things I need help with. However, in order to not get overwhelmed with the complexity of the project, I will need guidance on the local setup.

@serathius @MadhavJivrajani is there some other channel for communication?

burhanuddin6 avatar Mar 09 '25 10:03 burhanuddin6

Hello @serathius and @MadhavJivrajani,

My name is Sywen, I am an early career software engineer. I found this opportunity through the GSoC page and I am really excited about contributing to build a generic caching library at a lower level. I’ve been diving into both Kubernetes watch cache implementation and etcd codebase, definitely a lot to unpack!

I understand that this project is aiming for an eventual goal to replace the K8s builtin library, and I’m very interested in contributing not just during GSoC but potentially as a long term contributor if possible, but I wanted to clarify a few aspects regarding the project scope and design:

  • I noticed that the project is labeled small, but given the complexity—watch multiplexing, B-tree indexing, stale reads, and cache consistency—it looks quite ambitious. I’d love more details to understand how much of the architecture design is predefined vs how much contributors would be shaping in terms of granularity, as expected in the application to contribute.
  • I’d love your advice on whether we are targeting a PoC or aiming for a production ready implementation within the GSoC timeline.

Thanks!!

Symorglass avatar Mar 11 '25 17:03 Symorglass

I noticed that the project is labeled small, but given the complexity—watch multiplexing, B-tree indexing, stale reads, and cache consistency—it looks quite ambitious. I’d love more details to understand how much of the architecture design is predefined vs how much contributors would be shaping in terms of granularity, as expected in the application to contribute

Small size was based on two factors; there is a reference implementation in K8s that matches 1to1 what we want to do; code will be independent from rest of code, meaning no legacy code to learn/integrate.

I’d love your advice on whether we are targeting a PoC or aiming for a production ready implementation within the GSoC timeline.

Production ready in K8s takes at least 1 year :P

serathius avatar Mar 12 '25 09:03 serathius

Hi, I also came across this project through GSoC, and I’m excited about the potential of a generic caching library and proxy for etcd. I’d love to contribute for the long haul and help make it production-ready.

@MadhavJivrajani, thanks for the links to the additional context.

K-minutti avatar Mar 16 '25 20:03 K-minutti

Hi @MadhavJivrajani and @serathius, my name is Bob. I am a Software engineer. It’s my pleasure to contribute the etcd caching and proxy features when I found this opportunity through the GSoC page.

Here are my prepared works:

  1. Learn the knowledge of etcd project.
  2. Learn how it work when kubernetes using the etcd.
  3. Read the content of links for more context.

I am looking forward to this chance!

POABOB avatar Mar 19 '25 01:03 POABOB

Hi, I'm Jeff. I'm an undergraduate student and interested in this GSOC project. I have gone through related information. Before further attempting, I have some assumption to be checked and some question to figure out.

Assumptions

  1. The goal of the project is a client library which have the same interface with etcd client v3 but provide cached list (with B-tree) and cached watch (with watchcache) API.
  2. All reference code is located in "k8s.io/apiserver/pkg/storage".

Questions

  1. The client library should be absolutely independent of k8s (I guess this is the original goal) or partly dependence on k8s is allowed (e.g. The runtime.object may be useful for projects like Cilium and Calico mentioned above).
  2. I noticed that some components of cache in k8s are under active development (e.g. The delegator, according to KEP-4568 whose checklist is empty seems not ready now), should I wait them to be stable before further attempting?

I'm looking forward to further discussion on this project!

FouoF avatar Mar 20 '25 08:03 FouoF

Hi all, Replying to questions in this comment, and as a reminder, public communication is far more appreciated than DMs! So feel free to ping us here on this issue, or on the sig-etcd channel on the Kubernetes slack.

Please also note that responses may be delayed due to a high volume of queries and KubeCon taking place in the first week of April. You are strongly encouraged to bring your questions to the etcd slack channel in order to get them answered.


@burhanuddin6

is there some other channel for communication?

We have a slack channel on the Kubernetes slack (slack.k8s.io) called #sig-etcd

Please also see: https://github.com/kubernetes/community/tree/master/sig-etcd


@FouoF

The client library should be absolutely independent of k8s (I guess this is the original goal) or partly dependence on k8s is allowed (e.g. The runtime.object may be useful for projects like Cilium and Calico mentioned above).

That is correct. It is completely okay to use dependencies if needed. However, it should not exist as part of the Kubernetes codebase for the reasons mentioned in the issue.

I noticed that some components of cache in k8s are under active development (e.g. The delegator, according to KEP-4568 whose checklist is empty seems not ready now), should I wait them to be stable before further attempting?

You won't need to wait for these. Ideally in the long run, features like KEP-4568 will simply call into the library that we build and we don't necessarily need to rely on their implementation.

MadhavJivrajani avatar Mar 22 '25 22:03 MadhavJivrajani

Hello 👋,

I am Ikenna, a senior computer science student with an interest in distributed systems. I have taken classes in distributed systems, networking, and databases, and I enjoy exploring these domains outside the classroom. This will be a fun project to work on.

Ikenna-Okpala avatar Mar 23 '25 01:03 Ikenna-Okpala

Hi everyone, I'm applying for GSoC under CNCF to work on developing a generic watch cache for etcd. My proposal aims to create a caching layer similar to Kubernetes' watch cache but as a standalone package (go.etcd.io/cache) to improve scalability and simplify infrastructure management on etcd. This will help projects relying on etcd, like Cilium and Calico, by providing a standardized solution for caching watch events and list requests.

A bit about me—I’m a software engineer primarily working with Go, and I enjoy building scalable backend systems and efficient algorithms. I’ve previously worked on distributed systems concepts like MapReduce and have been exploring geospatial data processing. I’m excited about this project as it aligns with my interest in making infrastructure tools more efficient and developer-friendly.

kriyanshii avatar Mar 24 '25 16:03 kriyanshii

CCing @marseel who offered to help and feedback about potential Cilium integration. Marcel works at Isovalent on Cillium scalability and is the Chair of Kubernetes SIG-scalability.

/cc @marseel

serathius avatar Mar 26 '25 13:03 serathius

Hi, it seems to be a very interesting challenge to take on.

A little info about me: I am a newly grad cs student, and have some experience dealing with k8s during internship and learning the distributed system (kv, paxos to be specific).

I’m particularly interested in the challenge of making the cache reusable without losing efficiency. Balancing generalization (custom indexing) with performance (like fast watch demultiplexes and list latency) seems important, and I’d love to quantify and help with that .

A quick question: given the complexity, are you envisioning a thin compatibility layer over K8s internals, or a ground-up reimplementation guided by its design?

Looking forward to contributing and learning through this!

davvyin avatar Mar 27 '25 04:03 davvyin

Hi, @serathius and @marseel, I checked the implementation of kvstore in cilium, it used a simple map for watch cache so the integration is not hard as long as the new cache library is compatible with etcd client v3. And for calico, the etcd v3 client is recommended to replace caliico typha. As the cache in K8s codespace is naturally compatible with K8s, can we use successfully integration with common users like cilium and calico as a milestone?

FouoF avatar Mar 28 '25 01:03 FouoF

This is a project in etcd repository and mentors approval rights are limited to etcd. Milestones should not depend on other projects that we don't have merge right. We might collaborate, might collect feedback, might propose a PoC, but we cannot take the dependency.

serathius avatar Mar 28 '25 08:03 serathius

This is a project in etcd repository and mentors approval rights are limited to etcd. Milestones should not depend on other projects that we don't have merge right. We might collaborate, might collect feedback, might propose a PoC, but we cannot take the dependency.

Thanks for your reply. I will first focus on etcd repository itself and keep considering the need of potential users during design.

FouoF avatar Mar 29 '25 03:03 FouoF

Hello everyone, I just got to know about GSoC few days ago, I don't know if it's too late for me contribute.

I'm a devOps Engineer, and I haven't contributed to OS before, but I'm willing to learn.

I'm still going through the whole document so as to know where to add my contribution.

But I can't find the link to the slack channel please

@serathius @MadhavJivrajani

Lumen-jane avatar Mar 31 '25 08:03 Lumen-jane

@Lumen-jane please see https://github.com/etcd-io/etcd/issues/19371#issuecomment-2745901733

MadhavJivrajani avatar Apr 03 '25 17:04 MadhavJivrajani

I have created a proposal to implement a generic etcd cache library following the milestones outlined here (https://github.com/etcd-io/etcd/issues/19371#issue-2842006414).

I would like to thank everyone who asked questions in public forums, and I would be very happy if I could deliver value to the etcd community through this GSOC project.

kei01234kei avatar Apr 07 '25 15:04 kei01234kei

Neat idea, thanks for posting. Having had just a cursory look, it occurred to me that it might be elegant if it implements the same methods as a regular etcd client does, so that it could be a drop-in replacement for existing code that is currently using the normal v3 client methods.

Cheers

purpleidea avatar Apr 11 '25 18:04 purpleidea

Hi everyone,

I'm Yunkai, a master's student at UC Berkeley.

Since submitting my GSoC proposal for this project, I've started prototyping some of the core components — including a basic watcher module and an in-memory caching prototype inspired by Kubernetes-style watch-based indexing.

The code is currently in a private GitHub repo while I continue refining things, but I'd be happy to share access with anyone interested — feel free to reach out anytime.

Looking forward to continuing work on this!

— Yunkai

kaikaila avatar Apr 12 '25 03:04 kaikaila