tb
tb copied to clipboard
Tail the Binlog of a database
tail binlogs
⚠️ tb is no longer maintained. ⚠️
It is preserved here for the convenience of those who are still using it. We are no longer accepting issues or pull requests for TB.
If you need to make modifications, consider forking this repository. We'd be happy to update this notice with links to actively-maintained forks.
tb tails database binlogs. You point it at a database, and it streams every
write that is made to the database into an Avro Object Container File
per table.
You can tail these files using the materialized AVRO OCF source
type. This lets you fire push-based events when something of interest happens
from your database in a lightweight de-coupled fashion, which would
traditionally require database triggers or something more involved.
Usage
The easiest way to get started is to use the Docker image with a mounted
directory that materialized has access to:
docker run --rm -v /tmp/tbshare:/tbshare materialize/tb \
-t postgres
-p 5432
-d <database name>
-H "$POSTGRES_HOST"
-u "$POSTGRES_USER"
-P "$POSTGRES_PASSWORD"
--dir /tbshare/data
--save-file /tbshare/status
Important: Ensure your database specifies the requirements that are detailed below.
If you only want to monitor specific tables, pass the --whitelist flag:
--whitelist schemaName1.tableName1,schemaName2.tableName2
After starting materialized, run the appropriate CREATE SOURCE command
from a SQL shell:
CREATE SOURCE my_table
FROM AVRO OCF '/tbshare/data/tb.public.my_table' WITH (tail = true)
ENVELOPE DEBEZIUM;
After which the standard Materialize experience will be available to you:
SHOW COLUMNS FROM my_table;
Implementation
tb is built by embedding Debezium. Debezium is a distributed
fault-tolerant framework for tailing binlogs into Kafka, using the Kafka
connect framework. tb is intended to be a lightweight tool, suitable for
running on your laptop or against a single unreplicated database. The intent is
to enable a prototyping experience that doesn't require 6 containers tethered
together in an intricate docker-compose setup, while still being very easily
translatable to existing Kafka infrastructure.
The exact semantics of change data capture vary based on exactly which
database you connect to, and what version of database you're running. tb
relies upon streaming replication, which is a relatively new feature in various
databases, so for best results use a recent version of your favorite database.
Streaming replication requires some configuration, but care has been taken to
keep this configuration similar to best practices for database replication. If
your database is up-to-date and running in a replicated setup, it should be
seamless to connect tb to it, with minimal modification.
Currently, tb supports PostgreSQL (10+ recommended) and MySQL.
Database requirements
PostgreSQL
You should be running PostgreSQL 10+. Support for PostgreSQL <= 9 is possible,
but requires installing a plug-in such as decoderbufs or wal2json in the
database. PostgreSQL 10 onwards includes built-in streaming replication
capabilities via pgoutput, which is what tb uses out-of-the-box.
You will have to set the Write-ahead logging (WAL) level to logical.
ALTER SYSTEM set wal_level to logical;
You will need to restart your database server after changing this setting.
Tables should be created with their replica identity set to FULL.
CREATE TABLE table (a int, b int);
ALTER TABLE table REPLICA IDENTITY FULL;
MySQL
You should be running MySQL with row-level binary logging. This means your
my.cnf file should have entries that look like this:
log_bin = mysql-bin
binlog_format = row
If you use Homebrew on macOS, the my.cnf file lives at
/usr/local/etc/my.cnf. On Linux, consult your distribution's documentation.
Building and running
To build and run tb, do the following:
-
Run
mvn package:mvn packageSee Maven's Introduction to the Build Lifecycle document for details on the Maven build system.
-
Run the newly created JAR with
java -jar target/tb-0.1-SNAPSHOT.jar [args].