kafka-in-production

HitCount license stars

Curious to know how big companies are operating their kafka fleet in production? This might be the repo for you:

What are the issues encountered when running kafka in production? 📝
How other organisations attempt to solve the issues? 🛠️
Why certain approaches are adopted over others? :balance_scale:
What can we learn for our own use case?

Adobe
Agoda
Airbnb
Apple
AppsFlyer
Bloomberg
Bolt
Booking.com
Brex
Cloudflare
Coinbase
Criteo
Datadog
Deliveroo
GoTo
Grab
Honeycomb
LinkedIn
Lyft
Morgan Stanley
Netflix
Pinterest
Riskified
Robinhood
Slack
Stripe
Uber
Wise
Wix
Yelp
Zalando
Zopa Bank

Adobe

How Adobe Experience Platform Pipeline Became the Cornerstone of In-Flight Processing for Adobe - 2019 - :books:
Moving Beyond Newtonian Reductionism in the Management of Large-Scale Distributed Systems, Part 2 - 2019 - :books:
Adobe Experience Platform’s Streaming Sources and Destinations Overview and Architecture - 2019 - :books:
Wins from Effective Kafka Monitoring at Adobe: Stability, Performance, and Cost Savings - 2019 - :books:
Creating Adobe Experience Platform Pipeline with Kafka - 2018 - :books:

Agoda

How Agoda manages 1.5 Trillion Events per day on Kafka - 2021 - :books:
Adding Time Lag to Monitor Kafka Consumer - 2021 - :books:
How our data scientists' petabytes of data is ingested into Hadoop (from Kafka) - 2021 - :books:

Airbnb

Migrating Kafka transparently between Zookeeper clusters - 2021 - :books:

Apple

Balance Kafka Cluster with Zero Data Movement - 2023 - :studio_microphone:
Experiences Operating Apache Kafka® at Scale - 2019 - :studio_microphone:
Kafka as a Service A Tale of Security and Multi Tenancy - 2018 - :studio_microphone:

AppsFlyer

Four Crucial Steps to Take Before Changing Kafka Partition Key at Scale - 2023 - :books:
Kafka Lag Monitoring For Human Beings - 2020 - :studio_microphone:
Apache Kafka Lag Monitoring at AppsFlyer - 2020 - :books:
Managing your Kafka in an explosive growth environment - 2019 - :studio_microphone:

Bloomberg

Fully-Managed, Multi-Tenant Kafka Clusters: Tips, Tricks, and Tools - 2022 - :studio_microphone:

Bolt

Using Apache Kafka and ksqlDB for Data Replication at Bolt - 2021 - :studio_microphone:
How Bolt Has Adopted Change Data Capture with Confluent Platform - 2020 - :books:
Kewei Shang - 2020 - :books:

Booking.com

Data Streaming Ecosystem Management at Booking.com - 2018 - :books:

Brex

Transactional Events Publishing At Brex - 2022 - :books:

Cloudflare

Intelligent, automatic restarts for unhealthy Kafka consumers - 2023 - :books:
Using Apache Kafka to process 1 trillion inter-service messages - 2022 - :books:

Coinbase

Kafka infrastructure renovation at Coinbase - 2022 - :books:
How we scaled data streaming at Coinbase using AWS MSK - 2021 - :books:

Criteo

Managing Kafka and Data Streams at Criteo - 2023 - :books:
Upgrading Kafka on a large infra, or: when moving at scale requires careful planning - 2019 - :books:
How Criteo is managing one of the largest Kafka Infrastructure in Europe - 2019 - :books:

Datadog

Running Production Kafka Clusters in Kubernetes - 2019 - :studio_microphone:

Deliveroo

Improving Stream Data Quality With Protobuf Schema Validation - 2019 - :books:

GoTo

Sink Kafka Messages to ClickHouse Using 'ClickHouse Kafka Ingestor' - 2022 - :books:
When Kafka Went Offshore - 2021 - :books:
Enhancing Ziggurat - The Backbone Of Gojek's Kafka Ecosystem - 2021 - :books:
Handling Dead Letters in a Streaming System - 2020 - :books:
How Kafka Solved a Culture Problem at Gojek - 2019 - :books:
Fronting : An Armoured Car for Kafka Ingestion - 2018 - :books:
Sakaar: Taking Kafka data to cloud storage at GO-JEK - 2018 - :books:

Grab

Zero trust with Kafka - 2022 - :books:
How Kafka Connect helps move data seamlessly - 2022 - :books:
Exposing a Kafka Cluster via a VPC Endpoint Service - 2022 - :books:
Detect Fraud Successfully with GrabDefence! - 2021 - :studio_microphone:
Optimally Scaling Kafka Consumer Applications - 2020 - :books:

Honeycomb

Scaling Telemetry Systems with Streaming - 2023 - :studio_microphone:
Lessons Learned From the Migration to Confluent Kafka - 2021 - :books:
Scaling Kafka at Honeycomb - 2021 - :books:
Bitten by a Kafka Bug - Postmortem - 2019 - :books:

Load-balanced Brooklin Mirror Maker: Replicating large-scale Kafka clusters at LinkedIn - 2022 - :books:
TopicGC: How LinkedIn cleans up unused metadata for its Kafka clusters - 2022 - :books:
How LinkedIn customizes Apache Kafka for 7 trillion messages per day - 2019 - :books:
URP? Excuse You! The Three Metrics You Have to Know - 2018 - :studio_microphone:
Test Strategy for Samza/Kafka Services - 2017 - :books:
Kafka Ecosystem at LinkedIn - 2016 - :books:
Kafkaesque Days at LinkedIn – Part 1 - 2016 - :books:
How We’re Improving and Advancing Kafka at LinkedIn - 2015 - :books:

Lyft

Building an Adaptive, Multi-Tenant Stream Bus with Kafka and Golang - 2020 - :books:
Can Kafka Handle a Lyft Ride? - 2020 - :studio_microphone:
Operating Apache Kafka Clusters 24/7 Without A Global Ops Team - 2019 - :books:
Bulletproof Apache Kafka® with Fault Tree Analysis - 2019 - :studio_microphone:
Production Ready Kafka on Kubernetes - 2019 - :studio_microphone:

Morgan Stanley

Consistent, High-throughput, Real-time Calculation Engines Using Kafka Streams - 2023 - :studio_microphone:

Netflix

Featuring Apache Kafka in the Netflix Studio and Finance World - 2020 - :books:
Inca — Message Tracing and Loss Detection For Streaming Data @Netflix - 2019 - :books:
Evolution of the Netflix Data Pipeline - 2016 - :books:
Kafka Inside Keystone Pipeline - 2016 - :books:

Lessons Learned from Running Apache Kafka at Scale at Pinterest - 2021 - :books:
How Pinterest runs Kafka at scale - 2018 - :books:
Open sourcing DoctorKafka: Kafka cluster healing and workload balancing - 2017 - :books:

Riskified

How to Manage Schemas and Handle Standardization - 2023 - :books:
How to Roll Your Kafka Cluster With Zero Downtime and No Data Loss - 2023 - :books:
Know Your Limits: Cluster Benchmarks - 2022 - :books:
Let’s Make Your CFO Happy; A Practical Guide for Kafka Cost Reduction - 2022 - :studio_microphone:
From AWS CloudFormation to Terraform: Migrating Apache Kafka - 2021 - :books:

Robinhood

Tackling Kafka, with a Small Team - 2019 - :studio_microphone:

Slack

Building Self-driving Kafka clusters using open source components - 2022 - :books:

Stripe

6 Nines: How Stripe keeps Kafka highly-available across the globe - 2022 - :studio_microphone:

Uber

Securing Kafka® Infrastructure at Uber - 2022 - :books:
Real-Time Exactly-Once Ad Event Processing with Apache Flink, Kafka, and Pinot - 2021 - :books:
Introducing uGroup: Uber’s Consumer Management Framework - 2021 - :books:
Disaster Recovery for Multi-Region Kafka at Uber - 2020 - :books:
Kafka Cluster Federation at Uber - 2019 - :studio_microphone:
Building Reliable Reprocessing and Dead Letter Queues with Apache Kafka - 2018 - :books:
Introducing Chaperone: How Uber Engineering Audits Apache Kafka End-to-End - 2016 - :books:
uReplicator: Uber Engineering’s Robust Apache Kafka Replicator - 2016 - :books:

Wise

Streaming Infrastructure at Wise - 2023 - :studio_microphone:
Rack awareness in Kafka Streams - 2022 - :books:
Teamwork: Implementing a Kafka retry strategy at Wise - 2021 - :books:
Running Kafka in Kubernetes, Part 1: Why we migrated our Kafka clusters to Kubernetes. - 2021 - :books:
Running Kafka in Kubernetes, Part 2: How we migrated our Kafka clusters to Kubernetes. - 2021 - :books:
Securing Kafka with SPIFFE at TransferWise - Jonathan Oddy, Levani Kokhreidze - 2020 - :studio_microphone:
Achieving high availability with stateful Kafka Streams applications - 2018 - :books:

Wix

4 Steps for Kafka Rebalance - Notes From the Field - 2021 - :books:
Wix’s Journey Into Data Streams - 2021 - :books:
Building a High-level SDK for Kafka: Greyhound Unleashed - 2020 - :books:

Yelp

Kafka on PaaSTA: Running Kafka on Kubernetes at Yelp (Part 1 - Architecture) - 2021 - :books:
Streams and Monk – How Yelp is Approaching Kafka in 2020 - 2020 - :books:
Billions of Messages a Day – Yelp’s Real-time Data Pipeline - 2017 - :studio_microphone:

Zalando

Rock Solid Kafka and ZooKeeper Ops on AWS - 2018 - :books:
Many-to-Many Relationships Using Kafka - 2018 - :books:
Event First Development - Moving Towards Kafka Pipeline Applications - 2017 - :books:
Reattaching Kafka EBS in AWS - 2017 - :books:
Real-time Ranking with Apache Kafka’s Streams API - 2017 - :books:
Running Kafka Streams applications in AWS - 2017 - :books:
A Recipe for Kafka Lag Monitoring - 2017 - :books:
Surviving Data Loss - 2017 - :books:

Zopa Bank

Highly Available Kafka Consumers and Kafka Streams on Kubernetes - 2023 - :studio_microphone:

kafka-in-production
kafka-in-production copied to clipboard

Metadata

kafka-in-production

Table of Contents

Adobe

Agoda

Airbnb

Apple

AppsFlyer

Bloomberg

Bolt

Booking.com

Brex

Cloudflare

Coinbase

Criteo

Datadog

Deliveroo

GoTo

Grab

Honeycomb

LinkedIn

Lyft

Morgan Stanley

Netflix

Pinterest

Riskified

Robinhood

Slack

Stripe

Uber

Wise

Wix

Yelp

Zalando

Zopa Bank

← Metadata

Owner

Metadata

kafka-in-production kafka-in-production copied to clipboard

Metadata

kafka-in-production

Table of Contents

Adobe

Agoda

Airbnb

Apple

AppsFlyer

Bloomberg

Bolt

Booking.com

Brex

Cloudflare

Coinbase

Criteo

Datadog

Deliveroo

GoTo

Grab

Honeycomb

LinkedIn

Lyft

Morgan Stanley

Netflix

Pinterest

Riskified

Robinhood

Slack

Stripe

Uber

Wise

Wix

Yelp

Zalando

Zopa Bank

← Metadata

Owner

Metadata

kafka-in-production
kafka-in-production copied to clipboard