mysql_async icon indicating copy to clipboard operation
mysql_async copied to clipboard

Extraneous bytes in RotateEvent file name

Open andrenth opened this issue 2 years ago • 7 comments

Hello

I'm using mysql_async to write a program that tails the MariaDB binlog. I'm using MariaDB 10.5.15. The server is configured with the following binlog-related settings:

[mysqld]
log-bin = mariadb-bin
binlog-format = row
binlog-row-metadata = full

The following binlogs are available:

> show binary logs;
+--------------------+-----------+
| Log_name           | File_size |
+--------------------+-----------+
| mariadb-bin.000001 |      1074 |
| mariadb-bin.000002 |      5664 |
+--------------------+-----------+
2 rows in set (0.000 sec)

When running the program below, I'm seeing extraneous characters in the result of the name() and name_raw() methods in the RotateEvent struct:

use futures_util::StreamExt;
use mysql_async::{binlog::events::EventData, BinlogRequest, Conn, OptsBuilder, Result};

#[tokio::main]
async fn main() -> Result<()> {
    let opts = OptsBuilder::default()
        .ip_or_hostname("localhost")
        .tcp_port(3306)
        .user(Some("user"))
        .pass(Some("pass"));

    let conn = Conn::new(opts).await?;
    let req = BinlogRequest::new(1)
        .with_filename("mariadb-bin.000001".as_bytes())
        .with_pos::<u64>(4);

    let mut stream = conn.get_binlog_stream(req).await?;
    while let Some(Ok(event)) = stream.next().await {
        let data = event.read_data()?.unwrap();
        match data {
            EventData::RotateEvent(e) => {
                println!("[-] rotate event");
                println!("    - header pos={}", event.header().log_pos());
                println!("    - name={}", e.name().to_string());
                println!("    - raw={:?}", e.name_raw());
                println!("    - position={}", e.position());
            }
            _ => {}
        }
    }

    Ok(())
}

The program output is as follows:

[-] rotate event
    - header pos=0
    - name=mariadb-bin.000001R<sT
    - raw=[109, 97, 114, 105, 97, 100, 98, 45, 98, 105, 110, 46, 48, 48, 48, 48, 48, 49, 82, 60, 115, 84]
    - position=4
[-] rotate event
    - header pos=0
    - name=mariadb-bin.000002
    - raw=[109, 97, 114, 105, 97, 100, 98, 45, 98, 105, 110, 46, 48, 48, 48, 48, 48, 50]
    - position=4

Notice that in the first event, there are 4 extra bytes after the ones corresponding to mariadb-bin.000001, 82, 60, 115, 84, which I've found to be unexpected.

If I change the BinlogRequest as follows (pointing it to the end of the last binlog file):

    let req = BinlogRequest::new(1)
        .with_filename("mariadb-bin.000002".as_bytes())
        .with_pos::<u64>(5664);

then I get extra bytes in the name of that file:

[-] rotate event
    - header pos=0
    - name=mariadb-bin.000002?���
    - raw=[109, 97, 114, 105, 97, 100, 98, 45, 98, 105, 110, 46, 48, 48, 48, 48, 48, 50, 63, 188, 203, 26]
    - position=5446

Do you know what could be causing this?

andrenth avatar Apr 22 '22 20:04 andrenth

Hi. Most likely a bug in the parser. Could you, please, attach the files?

blackbeam avatar Apr 22 '22 21:04 blackbeam

Well, it seems that this rotate event is FAKE_ROTATE_EVENT, i.e. it is not written into the binlog file. I believe the issue is the following:

Client must read FORMAT_DESCRIPTION_EVENT (FDE) to be able to correctly handle event footer (i.e. to know that there should be (or should not be) a checksum), but FAKE_ROTATE_EVENT goes first and there is no way for a client to guess server configuration and to distinguish whether the footer is empty or it is not empty and contains a checksum. It seems that your server configured in a way that it adds checksums to binlog events, so this 4 additional bytes is most likely a checksum.

I'm not sure how to correctly handle this. For now I suggest you to delay this event until FDE is received. Then you should be able to chop off a checksum if it's necessary (you can decide by looking at (fde.footer().get_checksum_alg())

blackbeam avatar Apr 23 '22 06:04 blackbeam

Yeah, I’ve just checked and the server is configured with binlog_checksum = CRC32.

The Go canal package seems to identify these fake events by checking if the header position is zero. But in my first example above, where I read from the first binlog file, the header position is also zero in the second rotate event, which presumably is not a fake one…

I’ll try to use the FDE and report back, but probably only on Monday.

andrenth avatar Apr 23 '22 15:04 andrenth

This seems to have worked, though it seems weird to me because the stripped-off bytes don't match the result from event.calc_checksum():

use futures_util::StreamExt;
use mysql_async::{
    binlog::events::{BinlogEventFooter, EventData},
    BinlogRequest, Conn, OptsBuilder, Result,
};

#[tokio::main]
async fn main() -> Result<()> {
    let opts = OptsBuilder::default()
        .ip_or_hostname("localhost")
        .tcp_port(3306)
        .user(Some("user"))
        .pass(Some("pass"));

    let conn = Conn::new(opts).await?;
    let req = BinlogRequest::new(1).with_pos::<u64>(4);

    let mut stream = conn.get_binlog_stream(req).await?;
    let mut name = "".to_owned();

    while let Some(Ok(event)) = stream.next().await {
        let data = event.read_data()?.unwrap();
        match data {
            EventData::RotateEvent(e) => {
                println!("[-] rotate event");
                if name.len() == 0 && event.header().timestamp() == 0 {
                    // first fake rotate event
                    name = if let Ok(Some(alg)) = event.fde().footer().get_checksum_alg() {
                        let cksum = event.calc_checksum(alg);
                        println!("    - checksum={:?}", u32::to_be_bytes(cksum));
                        let raw = e.name_raw();
                        let raw = &raw[..raw.len() - BinlogEventFooter::BINLOG_CHECKSUM_LEN];
                        String::from_utf8_lossy(raw).to_string()
                    } else {
                        e.name().to_string()
                    }
                } else {
                    name = e.name().to_string();
                }
                println!("    - name={}", name);
                println!("    - orig name={:?}", e.name_raw());
            }
            _ => {}
        }
    }

    Ok(())
}

Output:

[-] rotate event
    - checksum=[33, 68, 223, 28]
    - name=mariadb-bin.000001
    - orig name=[109, 97, 114, 105, 97, 100, 98, 45, 98, 105, 110, 46, 48, 48, 48, 48, 48, 49, 82, 60, 115, 84]
[-] rotate event
    - name=mariadb-bin.000002
    - orig name=[109, 97, 114, 105, 97, 100, 98, 45, 98, 105, 110, 46, 48, 48, 48, 48, 48, 50]

andrenth avatar Apr 25 '22 14:04 andrenth

Do you think this workaround makes sense? I'm not sure if the calculated checksums are really supposed to match with extra bytes after the file name.

andrenth avatar Apr 27 '22 20:04 andrenth

I've tried replacing MariaDB with MySQL, and in that setup the extra bytes do not appear, even with binlog_checksum = CRC32. In MySQL, event.fde().footer().get_checksum_alg() returns BinlogChecksumAlg::BINLOG_CHECKSUM_ALG_OFF, while in MariaDB it returns BinlogChecksumAlg::BINLOG_CHECKSUM_ALG_CRC32.

I've also noticed that the extra bytes only seem to appear in the first event yielded by the BinlogStream, and this seems to happen regardless of the file and position parameters given to the BinlogRequest. So currently I'm using this code to extract the binlog file name from a RotateEvent, where the first parameter is only true for the first event of the stream.

fn binlog_file_name(event: &RotateEvent, footer: &BinlogEventFooter, first: bool) -> String {
    if !first {
        return event.name().to_string();
    }   
    if let Ok(Some(BinlogChecksumAlg::BINLOG_CHECKSUM_ALG_CRC32)) = footer.get_checksum_alg() {
        let name = event.name_raw();
        let name = &name[..name.len() - BinlogEventFooter::BINLOG_CHECKSUM_LEN];
        return String::from_utf8_lossy(name).to_string();
    }   
    event.name().to_string()
}

andrenth avatar Apr 29 '22 19:04 andrenth