mysqldump-to-csv icon indicating copy to clipboard operation
mysqldump-to-csv copied to clipboard

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

Open anjanesh opened this issue 2 years ago • 2 comments

After writing to the CSV from the table, I was trying to open the generated CSV and found that it contains 0xff on my Windows 11 machine.

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

So I had to open it as utf-16

with open(tables-imported.csv', 'r', encoding = "utf-16") as f:

anjanesh avatar Mar 23 '23 05:03 anjanesh

Slightly more precise repro:

python mysqldump-to-csv/mysqldump_to_csv.py <enwiki-latest-categorylinks.sql

blows up with:

Traceback (most recent call last):
  File "/home/ciro/down/wiki/mysqldump-to-csv/mysqldump_to_csv.py", line 114, in <module>
    main()
  File "/home/ciro/down/wiki/mysqldump-to-csv/mysqldump_to_csv.py", line 104, in main
    for line in fileinput.input():
  File "/usr/lib/python3.11/fileinput.py", line 251, in __next__
    line = self._readline()
           ^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/fileinput.py", line 372, in _readline
    return self._readline()
           ^^^^^^^^^^^^^^^^
  File "<frozen codecs>", line 322, in decode
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xdc in position 1980: invalid continuation byte

The likely reason is that that file contains binary data on the third column, it's a dumpsterfire:

INSERT INTO `categorylinks` VALUES (10,'Redirects_from_moves','*..2NN:,@2.FBHRP:D6^A^W^Aܽ<DC>^L','2014-10-26 04:50:23','','uca-default-u-kn','page'),

enwiki-latest-page.sql still works.

cirosantilli avatar Oct 10 '23 13:10 cirosantilli

Not entirely sure why but the solution at: https://github.com/jamesmishra/mysqldump-to-csv/issues/17 worked for me. Likely it just treats things more byte-wise, could be buggy on print, but does not blow up at least.

cirosantilli avatar Oct 10 '23 13:10 cirosantilli