fluvio icon indicating copy to clipboard operation
fluvio copied to clipboard

[Feature] Run without Kubernetes

Open sycured opened this issue 2 years ago • 12 comments

Have you a simple way to run fluvio without Kubernetes?

My orchestrator is Hashicorp Nomad

sycured avatar Feb 13 '22 20:02 sycured

Currently, Fluvio is designed to run on top of Kubernetes. We have a long-term plan to decouple from Kubernetes to allow it to run on different orchestrators. What is your use case and how soon do you need it so we can plan accordingly?

sehz avatar Feb 15 '22 15:02 sehz

I see fluvio as a Kafka replacement in my use case so less java and more rust

My setup is easy: 0 kubernetes, full Hashicorp Nomad as orchestrator because it's the only running on multiplatforms:

  • Linux with Podman
  • FreeBSD with pot also macOS

I can't give you a date about when I'll need it outside of Kubernetes because actually, I'm on dev/mvp of the platform and I've a Kafka cluster to ensure this stage but removing Kafka and the JVM is very interesting

sycured avatar Feb 15 '22 18:02 sycured

What do you use Kafka for and do you need any connector support?

sehz avatar Feb 16 '22 14:02 sehz

Actually, we do the poc with Kafka, connectors to PostgreSQL (debezium and sink)

All other parts are in Python or Rust so nothing special

Special thing: intensive usage of Avro format, we don't use JSON between services

sycured avatar Feb 16 '22 15:02 sycured

Thanks for the info. Can you share how much memory each broker consumes and the type of machine instance? We are trying to perform a baseline comparison with Kafka.

sehz avatar Feb 16 '22 15:02 sehz

I use Oracle Cloud at this time for the POC: Ampere (ARM) A1 instance with 2 vCores and 12 GB of memory Honestly, they are overkill about memory so I'm not the best for those stats at this time, sorry

The other alternative that I'm thinking to benchmark is NATS

https://www.oracle.com/cloud/compute/arm/#:~:text=Oracle%20Cloud%20Infrastructure%20offers%20Ampere,cache%2C%20and%20delivers%20predictable%20performance. https://community.arm.com/arm-community-blogs/b/tools-software-ides-blog/posts/apache-kafka-benchmarks-on-aws-graviton2

sycured avatar Feb 16 '22 20:02 sycured

Great. Thanks!

sehz avatar Feb 16 '22 20:02 sehz

Kubernetes coupling is really a killer. Personally, I wish I could test it to see if it could be used for market data ingestion as a replacement for Redpanda.

BTW, as an alternative to Kafka, there is Redpanda which is extremely simple to install (just a binary...) and configure, and in C++ with WASM support. Also, it doesn't need a "controller", instead uses the Raft consensus for the deployment cluster. It avoids the common issue of having an entire cluster down if the single controller is down...

See comparisons here: https://redpanda.com/blog/fast-and-safe/

Fluvio as a Rust competitor with more safety and less conformity with the Kafka API could really prove worthy. As soon as the need for K8s is abandoned...

arbfay avatar Feb 25 '22 22:02 arbfay

@arbfay thank you for the feedback. Fluvio was designed for hyper-scale deployments to handle huge volumes of data. In large environments, SPUs may be physically located on an edge device or different geo-location. In such situations, there is a need for a controller that sits outside of the SPU to ensure the overall health of the distributed system. Hence the need for separation between Controller and the SPUs.

Although, we do agree that the dependency on Kubernetes is a headache for small environments. We could use some help from the community on solving this issue.

ajhunyady avatar Feb 25 '22 22:02 ajhunyady

@arbfay thank you for the feedback. Fluvio was designed for hyper-scale deployments to handle huge volumes of data. In large environments, SPUs may be physically located on an edge device or different geo-location. In such situations, there is a need for a controller that sits outside of the SPU to ensure the overall health of the distributed system. Hence the need for separation between Controller and the SPUs.

For example, if we have health check via Consul (Service Mesh and Registry), Fluvio can use it to know which other instance is up and running by looking at active.fluvio.service.consul

With Kafka, it's exactly what I do when I need to configure a consumer or producer, the list of brokers is built using Consul ((active.)kafka.service.consul)

It's just an idea

Although, we do agree that the dependency on Kubernetes is a headache for small environments. We could use some help from the community on solving this issue.

Sorry but for example Hashicorp Nomad runs Cloudflare services: https://blog.cloudflare.com/how-we-use-hashicorp-nomad/

Production without Kubernetes is absolutely possible and it's more KISS from UNIX philosophy (Keep It Simple Stupid)

sycured avatar Feb 26 '22 14:02 sycured

Our reliance on Kubernetes was based on assumption that it was mostly widely deployment platform and we want to make it easier to deploy on it. We take advantage of Kubernetes feature like deployment management to run connector in a single unified platform which RedPanda or Kafka doesn't do. But seems like there is need to support other scheduler such as Nomad.

sehz avatar Feb 26 '22 18:02 sehz

Related to #1558

digikata avatar Jul 26 '23 20:07 digikata

Closing this issue as completed. Running a fluvio cluster without kubernetes is supported by default now fluvio cluster start or fluvio cluster start --local.

digikata avatar Feb 21 '24 22:02 digikata