python-mysql-replication icon indicating copy to clipboard operation
python-mysql-replication copied to clipboard

Decoding binary as utf-8

Open martinamps opened this issue 9 years ago • 6 comments

I get this trace:

Traceback (most recent call last):
  File "./test.py", line 33, in <module>
    main()
  File "./test.py", line 26, in main
    for binlogevent in stream:
  File "/usr/local/lib/python2.7/dist-packages/pymysqlreplication/binlogstream.py", line 262, in fetchone
    self.__freeze_schema)
  File "/usr/local/lib/python2.7/dist-packages/pymysqlreplication/packet.py", line 98, in __init__
    freeze_schema = freeze_schema)
  File "/usr/local/lib/python2.7/dist-packages/pymysqlreplication/event.py", line 141, in __init__
    self.query = tmp.decode("utf-8")
  File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xae in position 460: invalid start byte

I printed a repr of the packet and it is essentially:

INSERT INTO x(......, ip) VALUES (....,  'b\xae\xe1\xbd');

This is row based replication where the master was originally sent INET6_ATON('::1') for example.

What's the recommended solution here? I'm surprised no one else has hit this as many column types leverage binary.

Thanks!

martinamps avatar Sep 16 '15 22:09 martinamps

Can you post the type of the column?

Le jeu. 17 sept. 2015 à 00:56, Martin Amps [email protected] a écrit :

I get this trace:

Traceback (most recent call last): File "./test.py", line 33, in main() File "./test.py", line 26, in main for binlogevent in stream: File "/usr/local/lib/python2.7/dist-packages/pymysqlreplication/binlogstream.py", line 262, in fetchone self. _freeze_schema) File "/usr/local/lib/python2.7/dist-packages/pymysqlreplication/packet.py", line 98, in _init freeze_schema = freeze_schema) File "/usr/local/lib/python2.7/dist-packages/pymysqlreplication/event.py", line 141, in init self.query = tmp.decode("utf-8") File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode return codecs.utf_8_decode(input, errors, True) UnicodeDecodeError: 'utf8' codec can't decode byte 0xae in position 460: invalid start byte

I printed a repr of the packet and it is essentially:

INSERT INTO x(......, ip) VALUES (...., 'b\xae\xe1\xbd');

This is row based replication where the master was originally sent INET6_ATON('::1') for example.

What's the recommended solution here? I'm surprised no one else has hit this as many column types leverage binary.

Thanks!

— Reply to this email directly or view it on GitHub https://github.com/noplay/python-mysql-replication/issues/132.

julien-duponchelle avatar Sep 17 '15 06:09 julien-duponchelle

Apologies - it’s a varbinary(15)

On Sep 16, 2015, at 11:13 PM, Julien Duponchelle [email protected] wrote:

Can you post the type of the column?

Le jeu. 17 sept. 2015 à 00:56, Martin Amps [email protected] a écrit :

I get this trace:

Traceback (most recent call last): File "./test.py", line 33, in main() File "./test.py", line 26, in main for binlogevent in stream: File "/usr/local/lib/python2.7/dist-packages/pymysqlreplication/binlogstream.py", line 262, in fetchone self. _freeze_schema) File "/usr/local/lib/python2.7/dist-packages/pymysqlreplication/packet.py", line 98, in _init freeze_schema = freeze_schema) File "/usr/local/lib/python2.7/dist-packages/pymysqlreplication/event.py", line 141, in init self.query = tmp.decode("utf-8") File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode return codecs.utf_8_decode(input, errors, True) UnicodeDecodeError: 'utf8' codec can't decode byte 0xae in position 460: invalid start byte

I printed a repr of the packet and it is essentially:

INSERT INTO x(......, ip) VALUES (...., 'b\xae\xe1\xbd');

This is row based replication where the master was originally sent INET6_ATON('::1') for example.

What's the recommended solution here? I'm surprised no one else has hit this as many column types leverage binary.

Thanks!

— Reply to this email directly or view it on GitHub https://github.com/noplay/python-mysql-replication/issues/132.

— Reply to this email directly or view it on GitHub https://github.com/noplay/python-mysql-replication/issues/132#issuecomment-140978207.

martinamps avatar Sep 17 '15 06:09 martinamps

No problem :) Thanks a lot for the report. Sorry if I was rude in my question I just wake up :P

Le jeu. 17 sept. 2015 à 08:15, Martin Amps [email protected] a écrit :

Apologies - it’s a varbinary(15)

On Sep 16, 2015, at 11:13 PM, Julien Duponchelle < [email protected]> wrote:

Can you post the type of the column?

Le jeu. 17 sept. 2015 à 00:56, Martin Amps [email protected] a écrit :

I get this trace:

Traceback (most recent call last): File "./test.py", line 33, in main() File "./test.py", line 26, in main for binlogevent in stream: File

"/usr/local/lib/python2.7/dist-packages/pymysqlreplication/binlogstream.py", line 262, in fetchone self. _freeze_schema) File "/usr/local/lib/python2.7/dist-packages/pymysqlreplication/packet.py", line 98, in _init freeze_schema = freeze_schema) File "/usr/local/lib/python2.7/dist-packages/pymysqlreplication/event.py", line 141, in init self.query = tmp.decode("utf-8") File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode return codecs.utf_8_decode(input, errors, True) UnicodeDecodeError: 'utf8' codec can't decode byte 0xae in position 460: invalid start byte

I printed a repr of the packet and it is essentially:

INSERT INTO x(......, ip) VALUES (...., 'b\xae\xe1\xbd');

This is row based replication where the master was originally sent INET6_ATON('::1') for example.

What's the recommended solution here? I'm surprised no one else has hit this as many column types leverage binary.

Thanks!

— Reply to this email directly or view it on GitHub https://github.com/noplay/python-mysql-replication/issues/132.

— Reply to this email directly or view it on GitHub < https://github.com/noplay/python-mysql-replication/issues/132#issuecomment-140978207 .

— Reply to this email directly or view it on GitHub https://github.com/noplay/python-mysql-replication/issues/132#issuecomment-140978570 .

julien-duponchelle avatar Sep 17 '15 06:09 julien-duponchelle

No problem at all. Let me know if I can help at all debugging further - right now I just wrapped it in a try: except: block to make it fail a bit more gracefully, was planning to dive in a little deeper tomorrow. Bed time here on the west coast!

On Sep 16, 2015, at 11:25 PM, Julien Duponchelle [email protected] wrote:

No problem :) Thanks a lot for the report. Sorry if I was rude in my question I just wake up :P

Le jeu. 17 sept. 2015 à 08:15, Martin Amps [email protected] a écrit :

Apologies - it’s a varbinary(15)

On Sep 16, 2015, at 11:13 PM, Julien Duponchelle < [email protected]> wrote:

Can you post the type of the column?

Le jeu. 17 sept. 2015 à 00:56, Martin Amps [email protected] a écrit :

I get this trace:

Traceback (most recent call last): File "./test.py", line 33, in main() File "./test.py", line 26, in main for binlogevent in stream: File

"/usr/local/lib/python2.7/dist-packages/pymysqlreplication/binlogstream.py", line 262, in fetchone self. _freeze_schema) File "/usr/local/lib/python2.7/dist-packages/pymysqlreplication/packet.py", line 98, in _init freeze_schema = freeze_schema) File "/usr/local/lib/python2.7/dist-packages/pymysqlreplication/event.py", line 141, in init self.query = tmp.decode("utf-8") File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode return codecs.utf_8_decode(input, errors, True) UnicodeDecodeError: 'utf8' codec can't decode byte 0xae in position 460: invalid start byte

I printed a repr of the packet and it is essentially:

INSERT INTO x(......, ip) VALUES (...., 'b\xae\xe1\xbd');

This is row based replication where the master was originally sent INET6_ATON('::1') for example.

What's the recommended solution here? I'm surprised no one else has hit this as many column types leverage binary.

Thanks!

— Reply to this email directly or view it on GitHub https://github.com/noplay/python-mysql-replication/issues/132.

— Reply to this email directly or view it on GitHub < https://github.com/noplay/python-mysql-replication/issues/132#issuecomment-140978207 .

— Reply to this email directly or view it on GitHub https://github.com/noplay/python-mysql-replication/issues/132#issuecomment-140978570 .

— Reply to this email directly or view it on GitHub https://github.com/noplay/python-mysql-replication/issues/132#issuecomment-140979814.

martinamps avatar Sep 17 '15 06:09 martinamps

I think we can ignore the invalid unicode char (do not break existing app) or return a byte string

julien-duponchelle avatar Sep 17 '15 06:09 julien-duponchelle

I agree, it would probably be useful to parse it out in future but for now the potential exceptions should be fixed

On Sep 16, 2015, at 11:28 PM, Julien Duponchelle [email protected] wrote:

I think we can ignore the invalid unicode char (do not break existing app) or return a byte string

— Reply to this email directly or view it on GitHub https://github.com/noplay/python-mysql-replication/issues/132#issuecomment-140980563.

martinamps avatar Sep 17 '15 06:09 martinamps