self-hosted icon indicating copy to clipboard operation
self-hosted copied to clipboard

Migrate self-hosted kafka clusters to KRaft

Open hubertdeng123 opened this issue 2 years ago • 7 comments

This removes the need for zookeeper in self-hosted. We attempted to do this previously, but it would result in data loss in kafka so holding that off until a later date where it's safe to perform.

Relvant PR's: https://github.com/getsentry/self-hosted/pull/2445 https://github.com/getsentry/self-hosted/pull/2500

hubertdeng123 avatar Oct 19 '23 18:10 hubertdeng123

I'd vote for Redpanda instead. It should be compatible enough to Kafka API, unless Sentry is using weird features that only exists after Kafka v3.1 onwards, as Redpanda's compatibility is between v0.11.0 to v3.1 (see docs). Although, we'd need to create a migration between existing Kafka to Redpanda. The steps that I can think of is:

  1. Create new volume for sentry-redpanda
  2. Consume every message on Kafka, re-publish it on Redpanda.
  3. On finish, don't delete the sentry-kafka volume. Let it as is until there is no further issue.
  4. Stop the Kafka container, replace the Docker image on Kafka to be Redpanda (so the hostname still be "kafka").

Or, as an alternative, they have this one: https://docs.redpanda.com/current/upgrade/migrate/data-migration/

The reason behind "using Redpanda" is to minimize the heavy resource consumed by the JVM. Redpanda is far lightweight than Kafka, I've been using it on production (3 node cluster) for around 18 months.

aldy505 avatar Oct 20 '23 04:10 aldy505

We'd like to be as similar to SaaS as we can be. Right now, Clickhouse versions are way behind and are introducing issues in self-hosted that are not seen in SaaS (one here!). I fear that with introducing Redpanda, there will be an additional burden of maintenance placed on us since other Sentry developers will be on a different platform.

hubertdeng123 avatar Oct 23 '23 17:10 hubertdeng123

@hubertdeng123 So.. is this still on the timeline? And is there anyway the community can know what version of ClickHouse / Postgres / Kafka the SaaS instance is running, in order to keep it pretty much the same for self-hosted?

aldy505 avatar Jan 17 '24 01:01 aldy505

I can confirm that Sentry can work with Redpanda. Connected without any issues; everything is working.

hd-deman avatar Jan 19 '24 10:01 hd-deman

I fear that with introducing Redpanda, there will be an additional burden of maintenance placed on us since other Sentry developers will be on a different platform.

While I am unaware of the technical requirements of this, What about migrating both SaaS and Self-Hosted to redpanda. Wouldn't there be cost savings in SaaS by reducing system resources while increasing throughput?

Codel1417 avatar Jun 01 '24 02:06 Codel1417

Since https://github.com/getsentry/self-hosted/pull/3263 got merged there is no more need for a zookeeper @aldy505 can you open a PR with your existing red panda work please? It works since months for me

williamdes avatar Aug 16 '24 05:08 williamdes

@aldy505 can you open a PR with your existing red panda work please? It works since months for me

I'm gonna get back to you later. I'm planning to do some kind of A/B testing of using Redpanda vs Kafka KRaft.

I'm on 24.8.0 with Redpanda and it still works fine.

aldy505 avatar Aug 16 '24 05:08 aldy505