fluvio
fluvio copied to clipboard
[Feature] Run without Kubernetes
Have you a simple way to run fluvio without Kubernetes?
My orchestrator is Hashicorp Nomad
Currently, Fluvio is designed to run on top of Kubernetes. We have a long-term plan to decouple from Kubernetes to allow it to run on different orchestrators. What is your use case and how soon do you need it so we can plan accordingly?
I see fluvio as a Kafka replacement in my use case so less java and more rust
My setup is easy: 0 kubernetes, full Hashicorp Nomad as orchestrator because it's the only running on multiplatforms:
- Linux with Podman
- FreeBSD with pot also macOS
I can't give you a date about when I'll need it outside of Kubernetes because actually, I'm on dev/mvp of the platform and I've a Kafka cluster to ensure this stage but removing Kafka and the JVM is very interesting
What do you use Kafka for and do you need any connector support?
Actually, we do the poc with Kafka, connectors to PostgreSQL (debezium and sink)
All other parts are in Python or Rust so nothing special
Special thing: intensive usage of Avro format, we don't use JSON between services
Thanks for the info. Can you share how much memory each broker consumes and the type of machine instance? We are trying to perform a baseline comparison with Kafka.
I use Oracle Cloud at this time for the POC: Ampere (ARM) A1 instance with 2 vCores and 12 GB of memory Honestly, they are overkill about memory so I'm not the best for those stats at this time, sorry
The other alternative that I'm thinking to benchmark is NATS
https://www.oracle.com/cloud/compute/arm/#:~:text=Oracle%20Cloud%20Infrastructure%20offers%20Ampere,cache%2C%20and%20delivers%20predictable%20performance. https://community.arm.com/arm-community-blogs/b/tools-software-ides-blog/posts/apache-kafka-benchmarks-on-aws-graviton2
Great. Thanks!
Kubernetes coupling is really a killer. Personally, I wish I could test it to see if it could be used for market data ingestion as a replacement for Redpanda.
BTW, as an alternative to Kafka, there is Redpanda which is extremely simple to install (just a binary...) and configure, and in C++ with WASM support. Also, it doesn't need a "controller", instead uses the Raft consensus for the deployment cluster. It avoids the common issue of having an entire cluster down if the single controller is down...
See comparisons here: https://redpanda.com/blog/fast-and-safe/
Fluvio as a Rust competitor with more safety and less conformity with the Kafka API could really prove worthy. As soon as the need for K8s is abandoned...
@arbfay thank you for the feedback. Fluvio was designed for hyper-scale deployments to handle huge volumes of data. In large environments, SPUs may be physically located on an edge device or different geo-location. In such situations, there is a need for a controller that sits outside of the SPU to ensure the overall health of the distributed system. Hence the need for separation between Controller and the SPUs.
Although, we do agree that the dependency on Kubernetes is a headache for small environments. We could use some help from the community on solving this issue.
@arbfay thank you for the feedback. Fluvio was designed for hyper-scale deployments to handle huge volumes of data. In large environments, SPUs may be physically located on an edge device or different geo-location. In such situations, there is a need for a controller that sits outside of the SPU to ensure the overall health of the distributed system. Hence the need for separation between Controller and the SPUs.
For example, if we have health check via Consul (Service Mesh and Registry), Fluvio can use it to know which other instance is up and running by looking at active.fluvio.service.consul
With Kafka, it's exactly what I do when I need to configure a consumer or producer, the list of brokers is built using Consul ((active.)kafka.service.consul)
It's just an idea
Although, we do agree that the dependency on Kubernetes is a headache for small environments. We could use some help from the community on solving this issue.
Sorry but for example Hashicorp Nomad runs Cloudflare services: https://blog.cloudflare.com/how-we-use-hashicorp-nomad/
Production without Kubernetes is absolutely possible and it's more KISS from UNIX philosophy (Keep It Simple Stupid)
Our reliance on Kubernetes was based on assumption that it was mostly widely deployment platform and we want to make it easier to deploy on it. We take advantage of Kubernetes feature like deployment management to run connector in a single unified platform which RedPanda or Kafka doesn't do. But seems like there is need to support other scheduler such as Nomad.
Related to #1558
Closing this issue as completed. Running a fluvio cluster without kubernetes is supported by default now fluvio cluster start
or fluvio cluster start --local
.