traefik-proxy icon indicating copy to clipboard operation
traefik-proxy copied to clipboard

kv store support and clients

Open minrk opened this issue 2 years ago • 10 comments

We're in a bit of a weird situation with Key-Value (KV) store support. There don't appear to be any maintained clients for etcd or consul in Python, which is a bit weird. Traefik supports several KV stores, and we happened to pick etcd and consul. Not for any hugely specific reason, but they are single binaries, which makes them easy to install.

We've been using https://github.com/kragniz/python-etcd3 which is mostly unmaintained, and a breakage in grpcio prompted a few "works for me" forks, which may or may not take over, or end up abandoned, too. https://opendev.org/openstack/etcd3gw appears to be maintained, but doesn't seem to be meant for use by anyone, given its lack of documentation or any publicly-facing bug reporting, contributions, or anything, and the fact that roughly the only thing in its docs - a pip install command - has the wrong package name. python-consul2 also appears to be abandoned with no real candidate for an alternative.

grpcio/protobuf in general seems to be not a good stack for Python clients, which I think would be better served with far simpler, more stable http APIs.

I don't think we really care what's used, and the Python redis API situation is far healthier than etcd or consul. All we really care about is being able to support multiple traefik replicas in z2jh.

I think bootstrapping the KV store is far less important than traefik itself, because any situation where a KV backend is used, the KV store is almost guaranteed to be run separately via a container (there's no real reason to use KV on a single machine like littlest-jupyterhub, where files work just fine), so there's ~no situation where I imagine the install.py bootstrapping of a kv store to be useful in practice, and certainly not worth the relatively high maintenance cost of keeping install.py updated vs the small cost of end-users installing a single binary of their choice.

So the question is:

  1. what KV stores do we support?
  2. what tools do we support installing ourselves (just traefik, or traefik, etcd, consul, etc.)?

I currently think we should:

  • remove etcd and consul from install.py, leave it just for traefik
  • maybe deprecate consul support altogether (don't delete it because it works, but don't put more effort into maintaining it)
  • consider adding redis, as the far healthier option on the Python side
  • consider rewriting etcd to use HTTP instead of any etcd3 Python client. Our uses are so minimal, that this may be the simplest approach

minrk avatar Feb 21 '23 14:02 minrk

I think supporting redis (as the default?) distributed KV store makes sense. It's widely used and understood, the python module is maintained by redis, it's easily runnable as a container/helm chart as well as a fully managed cloud service, and if you did want to install it on a VM it's most likely in your Linux distribution repository.

I don't think we need to support installing redis/etcd/consul in the installer since there's a file backend https://github.com/jupyterhub/traefik-proxy/blob/main/jupyterhub_traefik_proxy/fileprovider.py

This would be similar to how JupyterHub supports multiple databases like PostgreSQL and MySQL, but only sqlite is supported out of the box, and we don't include other databases as part of the installation process.

manics avatar Feb 21 '23 15:02 manics

I'm +1 in keeping only traefik in the installer.

About kv stores supported, I'd also advocate for adding redis from a maintability point of view and deprecate both consul and etcd.

consider rewriting etcd to use HTTP instead of any etcd3 Python client. Our uses are so minimal, that this may be the simplest approach

I believe this makes sense to be in an issue that could be implemented when or if need be.

GeorgianaElena avatar Feb 21 '23 15:02 GeorgianaElena

Does anyone remember the background decisions that led to the choice of consul and etcd? This would help us decide whether to keep, deprecate or drop them.

manics avatar Feb 21 '23 16:02 manics

Agreed, I also thought install.py was a little unnecessary, and agree that it adds an unnecessary maintenance burden, with having to change the checksums, etc. I was completely unaware that python-etcd3 and python-consul were no longer maintained, though.

alexleach avatar Feb 21 '23 18:02 alexleach

On a slightly related, but separate note... I guess (because I've never deployed a Kubernetes cluster) TLJH describes how to deploy a Kubernetes cluster with jupyter-traefik-proxy running as a service. Personally, I use docker-compose to run jupyterhub with jupyterhub-traefik-proxy in one service and traefik in another service (actually in a completely separate docker-compose project). I don't bother with etcd or consul backends, as I run this on a single host, so I personally find the high availability backends unnecessary. What I'm getting at, is I think an example / minimal docker-compose file and related config files and documentation would be useful. Thoughts? I'm appy to put some time into this.

alexleach avatar Feb 21 '23 20:02 alexleach

Does anyone remember the background decisions that led to the choice of consul and etcd? This would help us decide whether to keep, deprecate or drop them.

IIRC (maybe @GeorgianaElena remembers better), etcd was selected as the first, just because it was the first and simplest kv store that came to mind. We picked up consul due to apparent performance issues with etcd (#56). Both being simple go binaries also makes them easy to install/deploy, e.g. for tests, but I don't think they were chosen with great care.

traefik config-loading seems to be incredibly slow compared to CHP, but we need to revisit the benchmarks to get an updated comparison (#163). Maybe we can get redis in there as well. I can't seem to find a benchmark of traefik's KV performance for different providers. The main consideration is traefik key-value watch performance, which does seem to vary across KV implementations, at least in traefik 1.x.

minrk avatar Feb 24 '23 13:02 minrk

Exploring consul clients a bit more, there's:

  • hc-pyconsul, which appears to be brand new and active, but only created/used by one person so far
  • py-consul is a slightly less outdated fork than python-consul2, but explicitly temporary fork that doesn't allow Issues so isn't really planned as a stable client

minrk avatar Mar 13 '23 09:03 minrk

after #185, it should be a lot easier to add KV implementations like redis, since only 3 methods need to be implemented - generic methods to add, remove, and get keys from a kv store.

minrk avatar Mar 17 '23 13:03 minrk

btw, I found etcd3gw's development page, which I couldn't find last time since all of its official links are broken. It's still clearly actively maintained, but shows all signs of being a purely internal tool, not meant for public use:

The next time etcd breaks, if it happens, I think we should either:

  • drop etcd3 support, or
  • use etcd3gw and vendor Calico's auth-adding subclass

minrk avatar Jan 19 '24 08:01 minrk

Great to see redis support getting merged! Is this going to be released anytime soon?

aberey avatar Apr 22 '24 06:04 aberey