mysql_async
mysql_async copied to clipboard
Extraneous bytes in RotateEvent file name
Hello
I'm using mysql_async to write a program that tails the MariaDB binlog. I'm using MariaDB 10.5.15. The server is configured with the following binlog-related settings:
[mysqld]
log-bin = mariadb-bin
binlog-format = row
binlog-row-metadata = full
The following binlogs are available:
> show binary logs;
+--------------------+-----------+
| Log_name | File_size |
+--------------------+-----------+
| mariadb-bin.000001 | 1074 |
| mariadb-bin.000002 | 5664 |
+--------------------+-----------+
2 rows in set (0.000 sec)
When running the program below, I'm seeing extraneous characters in the result of the name()
and name_raw()
methods in the RotateEvent
struct:
use futures_util::StreamExt;
use mysql_async::{binlog::events::EventData, BinlogRequest, Conn, OptsBuilder, Result};
#[tokio::main]
async fn main() -> Result<()> {
let opts = OptsBuilder::default()
.ip_or_hostname("localhost")
.tcp_port(3306)
.user(Some("user"))
.pass(Some("pass"));
let conn = Conn::new(opts).await?;
let req = BinlogRequest::new(1)
.with_filename("mariadb-bin.000001".as_bytes())
.with_pos::<u64>(4);
let mut stream = conn.get_binlog_stream(req).await?;
while let Some(Ok(event)) = stream.next().await {
let data = event.read_data()?.unwrap();
match data {
EventData::RotateEvent(e) => {
println!("[-] rotate event");
println!(" - header pos={}", event.header().log_pos());
println!(" - name={}", e.name().to_string());
println!(" - raw={:?}", e.name_raw());
println!(" - position={}", e.position());
}
_ => {}
}
}
Ok(())
}
The program output is as follows:
[-] rotate event
- header pos=0
- name=mariadb-bin.000001R<sT
- raw=[109, 97, 114, 105, 97, 100, 98, 45, 98, 105, 110, 46, 48, 48, 48, 48, 48, 49, 82, 60, 115, 84]
- position=4
[-] rotate event
- header pos=0
- name=mariadb-bin.000002
- raw=[109, 97, 114, 105, 97, 100, 98, 45, 98, 105, 110, 46, 48, 48, 48, 48, 48, 50]
- position=4
Notice that in the first event, there are 4 extra bytes after the ones corresponding to mariadb-bin.000001
, 82, 60, 115, 84
, which I've found to be unexpected.
If I change the BinlogRequest
as follows (pointing it to the end of the last binlog file):
let req = BinlogRequest::new(1)
.with_filename("mariadb-bin.000002".as_bytes())
.with_pos::<u64>(5664);
then I get extra bytes in the name of that file:
[-] rotate event
- header pos=0
- name=mariadb-bin.000002?���
- raw=[109, 97, 114, 105, 97, 100, 98, 45, 98, 105, 110, 46, 48, 48, 48, 48, 48, 50, 63, 188, 203, 26]
- position=5446
Do you know what could be causing this?
Hi. Most likely a bug in the parser. Could you, please, attach the files?
Well, it seems that this rotate event is FAKE_ROTATE_EVENT, i.e. it is not written into the binlog file. I believe the issue is the following:
Client must read FORMAT_DESCRIPTION_EVENT (FDE) to be able to correctly handle event footer (i.e. to know that there should be (or should not be) a checksum), but FAKE_ROTATE_EVENT goes first and there is no way for a client to guess server configuration and to distinguish whether the footer is empty or it is not empty and contains a checksum. It seems that your server configured in a way that it adds checksums to binlog events, so this 4 additional bytes is most likely a checksum.
I'm not sure how to correctly handle this. For now I suggest you to delay this event until FDE is received. Then you should be able to chop off a checksum if it's necessary (you can decide by looking at (fde.footer().get_checksum_alg())
Yeah, I’ve just checked and the server is configured with binlog_checksum = CRC32
.
The Go canal
package seems to identify these fake events by checking if the header position is zero. But in my first example above, where I read from the first binlog file, the header position is also zero in the second rotate event, which presumably is not a fake one…
I’ll try to use the FDE and report back, but probably only on Monday.
This seems to have worked, though it seems weird to me because the stripped-off bytes don't match the result from event.calc_checksum()
:
use futures_util::StreamExt;
use mysql_async::{
binlog::events::{BinlogEventFooter, EventData},
BinlogRequest, Conn, OptsBuilder, Result,
};
#[tokio::main]
async fn main() -> Result<()> {
let opts = OptsBuilder::default()
.ip_or_hostname("localhost")
.tcp_port(3306)
.user(Some("user"))
.pass(Some("pass"));
let conn = Conn::new(opts).await?;
let req = BinlogRequest::new(1).with_pos::<u64>(4);
let mut stream = conn.get_binlog_stream(req).await?;
let mut name = "".to_owned();
while let Some(Ok(event)) = stream.next().await {
let data = event.read_data()?.unwrap();
match data {
EventData::RotateEvent(e) => {
println!("[-] rotate event");
if name.len() == 0 && event.header().timestamp() == 0 {
// first fake rotate event
name = if let Ok(Some(alg)) = event.fde().footer().get_checksum_alg() {
let cksum = event.calc_checksum(alg);
println!(" - checksum={:?}", u32::to_be_bytes(cksum));
let raw = e.name_raw();
let raw = &raw[..raw.len() - BinlogEventFooter::BINLOG_CHECKSUM_LEN];
String::from_utf8_lossy(raw).to_string()
} else {
e.name().to_string()
}
} else {
name = e.name().to_string();
}
println!(" - name={}", name);
println!(" - orig name={:?}", e.name_raw());
}
_ => {}
}
}
Ok(())
}
Output:
[-] rotate event
- checksum=[33, 68, 223, 28]
- name=mariadb-bin.000001
- orig name=[109, 97, 114, 105, 97, 100, 98, 45, 98, 105, 110, 46, 48, 48, 48, 48, 48, 49, 82, 60, 115, 84]
[-] rotate event
- name=mariadb-bin.000002
- orig name=[109, 97, 114, 105, 97, 100, 98, 45, 98, 105, 110, 46, 48, 48, 48, 48, 48, 50]
Do you think this workaround makes sense? I'm not sure if the calculated checksums are really supposed to match with extra bytes after the file name.
I've tried replacing MariaDB with MySQL, and in that setup the extra bytes do not appear, even with binlog_checksum = CRC32
. In MySQL, event.fde().footer().get_checksum_alg()
returns BinlogChecksumAlg::BINLOG_CHECKSUM_ALG_OFF
, while in MariaDB it returns BinlogChecksumAlg::BINLOG_CHECKSUM_ALG_CRC32
.
I've also noticed that the extra bytes only seem to appear in the first event yielded by the BinlogStream
, and this seems to happen regardless of the file and position parameters given to the BinlogRequest
. So currently I'm using this code to extract the binlog file name from a RotateEvent
, where the first
parameter is only true
for the first event of the stream.
fn binlog_file_name(event: &RotateEvent, footer: &BinlogEventFooter, first: bool) -> String {
if !first {
return event.name().to_string();
}
if let Ok(Some(BinlogChecksumAlg::BINLOG_CHECKSUM_ALG_CRC32)) = footer.get_checksum_alg() {
let name = event.name_raw();
let name = &name[..name.len() - BinlogEventFooter::BINLOG_CHECKSUM_LEN];
return String::from_utf8_lossy(name).to_string();
}
event.name().to_string()
}