bottledwater-pg
bottledwater-pg copied to clipboard
Production-readiness?
Your article on bottled water is pretty compelling, and I'm interested in exploring it for a production use case. However, the fact that an official release has not been cut in over a year gives me pause. What is the plan for this project w.r.t. production readiness?
I'm also wondering the same thing. I don't mind having to keep up with builds from master for now, but are there any plans for further releases? Right now, there's just the original 0.1 from last year.
+1 - is anyone using it in production? Is it "safe" in the sense that the extension only reads the log and not corrupt the source Postgres data?
I'm not speaking in any official capacity, but I've been a de facto maintainer recently, so I guess I'm as good a person as any to comment on this.
Is it "safe" in the sense that the extension only reads the log and not corrupt the source Postgres data?
Yes, Bottled Water never writes to your data. Bottled Water has two parts:
- a client daemon, which connects to Postgres using the replication protocol and consumes the logical replication stream, and thus is unable to modify any data (similar to how a read replica would work)
- a Postgres extension, which registers a few user-defined functions for exporting a snapshot of the database and encoding data as Avro, and provides a logical decoding plugin that Avro-encodes the data that gets written to the logical replication stream.
That said, it is possible for it to affect the availability of Postgres, most notably because Postgres has to retain its WAL (write-ahead log) until Bottled Water has consumed it. If Bottled Water stops consuming (for example, if it can't publish because Kafka is down or there is a problem with the topic config), that retained WAL can grow without limit, potentially filling up whatever disk it's on. That condition should be easy enough to detect (monitor disk space on the WAL partition), and can be remediated by killing the Bottled Water daemon and executing SELECT pg_drop_replication_slot('bottledwater')
to free up the retained WAL.
is anyone using it in production?
Not to my knowledge, but a few teams are testing it on production-sized databases.
If you're confident in your monitoring, it should be stable enough to try out in production, although I wouldn't use it on a mission-critical database just yet.
official release
I can't speak to the word "official", or for long-term plans, but in terms of declaring this "production ready", my personal opinion is that will depend on usage. The code is reasonably stable now, but there are still operational questions to be figured out (e.g. #36, #99), and changes to the data representation are likely (e.g. deciding how to represent certain Postgres datatypes). If you have a non-mission-critical database you can try it out on, that will help to answer some of these questions, guided by the use case.
Just in terms of not having to keep up with master, it's certainly feasible to declare an 0.2 release, but there are fixes and improvements landing with some frequency so I'm not sure how meaningful a static version number would be at this stage.
Sam, thanks for the update! I think we're interested to try it but it would be nice to understand the roadmap and the commitment (if any) from Confluent in pushing it forward.
Hi,
Sorry to bump this, but is there any update on roadmap or production-readiness? Is Confluent. or anyone actively using the project? Or is everyone just manually publishing data change events to Kafka?
I tried using this project for an semi-offline data crunching production system, but ran into the issue that Heroku does not support the required extension to run Bottled Water. Apparently Heroku tried to get this project production ready w/ @samstokes's help, but it still destabilizes Postgres too much.
Then I tried using Kafka Connect w/ their https://github.com/confluentinc/kafka-connect-jdbc connector which supports Postgres and ran into an out of memory-issue (https://github.com/confluentinc/kafka-connect-jdbc/issues/34) which then I got a patched version from Confluent (which is still not publicly patched since an integration test is missing). This approach works, but is in parts very inflexible.
Long story short, at the end I ended up building my own solution via the Kafka Connect pattern which runs fine, but I would still love to see this project get fully stable and get supported by Heroku's Kafka setup.
@larskluge What do you mean "but is in parts very inflexible"?
@xrewndel kafka-connect-jdbc is quite inflexible when it comes to customizing the database query. a simple "select * from table" works well, but we often had the desire to use more complex queries. Internally the provided query is modified by kafka-connect-jdbc via simple string concatenation. This means a WHERE is appended at the end, so you can not configure a WHERE clause in your provided sql query nor any other sql commands that need to be positioned later in the query like a GROUP BY.
Debezium seems to have mysql/pgsql connectors that should fill the same role as Bottled Water. Can anyone comment on their approach or feature set, compared to BW? Could the two projects be somehow merged?
The Debezium version solves the problem using the same basic architecture -- a logical decoding plugin allows access to the commit log. Debezium is based on Kafka Connect so it gets many benefits of the framework (e.g. flexible serialization format, apply simple transformations to the data inline during extraction). It can also do consistent snapshots and then stream additional changes. Debezium also has more consistent effort behind it, so is more likely to be well maintained. @rhauch could probably fill in some additional details.
Hi @ewencp - thanks, I believe that Debezium maintainer @rhauch has also moved from Red Hat to Confluent.
Does this collectively mean that this project is being sunsetted in favour of Debezium?
@alexanderdean This project originated in Confluent but has never been officially supported. @samstokes has been the primary maintainer for awhile, he can speak to the level of support he's providing.
For Confluent, solutions based on Kafka Connect will generally be preferred as they come with a whole host of benefits -- pluggable serialization, pluggable transformations, scalability, fault tolerance, standardized metrics, monitoring, and logging, etc.
Currently the Debezium PostgreSQL connector requires a custom logical decoding plugin that is similar to but also much more simple than BottledWater's. However, we hope to support the PostgreSQL test logical decoding plugin, which is provided by PostgreSQL and more likely to be usable in production environments (like Amazon RDS).
@rhauch was reading about Debezium PostgreSQL connector and realised it needs to run or postgres 9.6 or above. Is there an alternate connector for version 9.5 given that bottledwater-pg doesn't seem production ready from the discussion above
Coming back to this topic years later as the OP, the Debezium Connector for PostgreSQL has indeed become the de facto standard for this kind of work. To @rhauch's point, pgoutput
, the standard logical decoding plug-in in PostgreSQL 10+, is now supported as well 🎉
@samstokes @ept @mcapitanio Can we get an update to the README
pointing people to this mature option that the community has converged on?
Just my $0.02 - been using Debezium in production for about 2 years, with PostgreSQL and decoderbufs
. The development team is helpful and responsive, the release cycle is predictable, the documentation is good. It's CDC heaven.
Thank you to Bottled Water developers as well, for getting the ball rolling. Hopefully CDC will become a standard feature in database world in the coming years.
Closing as there is a well-established alternative in the wild