mysql-binlog-connector-java icon indicating copy to clipboard operation
mysql-binlog-connector-java copied to clipboard

Allow to re-read binlog from a previous position

Open gunnarmorling opened this issue 7 years ago • 7 comments

It'd be very useful to have a way to re-read the binlog from a previous position. Let's say we set the binlog position to 100 before connecting, then read until position 300, we'd like to have a way to re-read the binlog from position 200.

The motivation for this feature request is simpler means of dealing with rolled back or committed transactions exposed in the binlog. It's my understanding that rolled back TX should not show up in the binlog, but sometimes they actually do. One case is when a temporary table is dropped (see DBZ-390 for the sequence of statements). In such case, we'd wish to ignore any events from the rolled back transaction. We could do so, by delaying event processing after a BEGIN event until we see the ROLLBACK or COMMIT. In case of the former we'd simply continue with processing the binlog after the ROLLBACK, in case of the latter we'd have to back to the binlog position of BEGIN and process the events from there.

An alternative approach would be an option which lets this connector only propagate committed transactions and filter out rolled back transactions. For that, after a BEGIN the events would have to be buffered and only pushed out to event handlers upon a COMMIT but dropped after a ROLLBACK. It would make things even easier for consumers, but arguably move complexity (buffer handling) into this connector.

Thanks a lot for considering this request.

gunnarmorling avatar Oct 20 '17 10:10 gunnarmorling

Hi @gunnarmorling.

The thing is, once BINLOG_DUMP command is sent and MySQL server starts streaming events - there is no way to "rewind" to some previous position other that reconnect (which by its nature is relatively heavy operation). "delaying event processing after a BEGIN event until we see the ROLLBACK or COMMIT" is the only option.

What are the reasons for not doing event buffering on the Debezium side?

shyiko avatar Oct 20 '17 11:10 shyiko

Hi, thanks for the quick reply!

there is no way to "rewind" to some previous position other that reconnect

Yeah, I was kinda afraid that'd be the case. We also considered that but dismissed it due to the overhead.

What are the reasons for not doing event buffering on the Debezium side?

That's what we're going to do probably. But then I thought we might not be the only ones facing that issue, so it'd be worth it to have built into the connector, allowing others to benefit from it, too.

gunnarmorling avatar Oct 20 '17 12:10 gunnarmorling

Sounds reasonable.

My only concern is that people often have some kind of event buffering already in place, usually to group events by transactions (at which point ROLLBACK handling becomes trivial) + currently there is no buffering logic inside of mysql-binlog-connector-java (which makes it easy to reason about memory consumption and saves us from questions like "how to do we handle transactions that span gigabites" (I have a feeling that answer might be client-specific)).

Anyway, I'm not actually opposed to the idea and if you guys have an elegant solution - PR is more than welcome (buffering logic would probably end up in EventDeserializer (which, on a separate note, should have never been implemented as a concrete class), so that both BinaryLogClient & BinaryLogFileReader would gain access to ROLLBACK-aware streaming).

shyiko avatar Oct 20 '17 15:10 shyiko

I have the same problem. My requirement is that the binlog-client needs to stop for a while and need to remember this binlog-position. After the client restarts, synchronization can be resumed from where client stop. Is there any way to achieve this? thank you! @gunnarmorling

ChaoShuChina avatar Jun 19 '18 06:06 ChaoShuChina

@ChaoShuChina regarding about resume from where client stop, do you have any idea how to achieve this? i am also having this requirement, have you achieve it? can i know how you achieve?

yong93 avatar Jul 12 '18 02:07 yong93

My client saves the last position it has processed. When it is restarted for some reason it uses that saved position to call BinaryLogClient.setBinlogFilename and BinaryLogClient.setBinlogPosition before connecting. However, in most cases that doesn't do anything. Only sometimes it receives events from that previous position. Could this be an AWS Aurora issue?

wdonne avatar Jul 29 '19 13:07 wdonne

My client saves the last position it has processed. When it is restarted for some reason it uses that saved position to call BinaryLogClient.setBinlogFilename and BinaryLogClient.setBinlogPosition before connecting. However, in most cases that doesn't do anything. Only sometimes it receives events from that previous position. Could this be an AWS Aurora issue?

only the position of begin event can be set .i don't know where to find it from doc .Hope the author add the info into the javadoc future;

panda0120 avatar Mar 10 '20 10:03 panda0120