data-diff
data-diff copied to clipboard
Bug with latest `mysql-connector-python` (version 8.0.30)
Was having issues with the MySQL connector on the latest version
Had to manually downgrade it to 8.0.29
It might be better to add some lower/upper boundaries in the pyproject.toml of data-diff
I can't reproduce your error. Version 8.0.30 works for me on both windows and linux.
Can you please provide more details on this error? How are you using data-diff, and on which platform?
We encountered a similar issue. Using data-diff 0.2.3 to try to connect to MySQL 5.7.32. (which also ran into a different error while handling the first error). (We should upgrade to 0.2.4 but the original error would persist).
Traceback (most recent call last):
[INFO][2022-08-10 20:09:56 +0000] File "/usr/local/lib/python3.10/site-packages/data_diff/databases/mysql.py", line 43, in create_connection
[INFO][2022-08-10 20:09:56 +0000] return mysql.connect(charset="utf8", use_unicode=True, **self._args)
[INFO][2022-08-10 20:09:56 +0000] File "/usr/local/lib/python3.10/site-packages/mysql/connector/pooling.py", line 286, in connect
[INFO][2022-08-10 20:09:56 +0000] return CMySQLConnection(*args, **kwargs)
[INFO][2022-08-10 20:09:56 +0000] File "/usr/local/lib/python3.10/site-packages/mysql/connector/connection_cext.py", line 101, in __init__
[INFO][2022-08-10 20:09:56 +0000] self.connect(**kwargs)
[INFO][2022-08-10 20:09:56 +0000] File "/usr/local/lib/python3.10/site-packages/mysql/connector/abstracts.py", line 1099, in connect
[INFO][2022-08-10 20:09:56 +0000] self._post_connection()
[INFO][2022-08-10 20:09:56 +0000] File "/usr/local/lib/python3.10/site-packages/mysql/connector/abstracts.py", line 1071, in _post_connection
[INFO][2022-08-10 20:09:56 +0000] self.set_charset_collation(self._charset_id)
[INFO][2022-08-10 20:09:56 +0000] File "/usr/local/lib/python3.10/site-packages/mysql/connector/abstracts.py", line 1016, in set_charset_collation
[INFO][2022-08-10 20:09:56 +0000] ) = CharacterSet.get_charset_info(charset)
[INFO][2022-08-10 20:09:56 +0000] File "/usr/local/lib/python3.10/site-packages/mysql/connector/constants.py", line 775, in get_charset_info
[INFO][2022-08-10 20:09:56 +0000] info = cls.get_default_collation(charset)
[INFO][2022-08-10 20:09:56 +0000] File "/usr/local/lib/python3.10/site-packages/mysql/connector/constants.py", line 746, in get_default_collation
[INFO][2022-08-10 20:09:56 +0000] raise ProgrammingError(f"Character set '{charset}' unsupported")
[INFO][2022-08-10 20:09:56 +0000] mysql.connector.errors.ProgrammingError: Character set '255' unsupported
[INFO][2022-08-10 20:09:56 +0000]
[INFO][2022-08-10 20:09:56 +0000] During handling of the above exception, another exception occurred:
[INFO][2022-08-10 20:09:56 +0000]
[INFO][2022-08-10 20:09:56 +0000] Traceback (most recent call last):
[INFO][2022-08-10 20:09:56 +0000] File "/app/run_diff.py", line 40, in <module>
[INFO][2022-08-10 20:09:56 +0000] c = target.create_connection()
[INFO][2022-08-10 20:09:56 +0000] File "/usr/local/lib/python3.10/site-packages/data_diff/databases/mysql.py", line 50, in create_connection
[INFO][2022-08-10 20:09:56 +0000] raise ConnectError(*e._args) from e
[INFO][2022-08-10 20:09:56 +0000] AttributeError: 'ProgrammingError' object has no attribute '_args'. Did you mean: 'args'?
Preliminary investigation suggests that this is an upstream problem in mysql-connector
which lists utf8mb4_0900_ai_ci
as only supported on MySQL 8, while it is supported in MySQL 5.7 as well. It's related to their changes for the 8.0.30 release related to charset handling. ref
So you're saying mysql-connector-python v8.0.30 introduces a bug? Did you open an issue there?
We can limit the version on our side, but I want to first make sure it's the best way forward.
Closed due to inactivity.
@erezsh Sorry for ressurecting this, but I'm hitting the same wall here.
I'm using MySQL 5.7.23 (AWS rds) and data-diff 0.3.0rc1 and when I run this:
from data_diff import connect_to_table
user = "xxxxxxxx"
password = "xxxxxxxxxxxxxxxxx"
database = "xxxxxxxxxxxxxxxx"
hostname = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.rds.amazonaws.com"
db_info = f"mysql://{user}:{password}@{hostname}/{database}"
table = connect_to_table(
db_info,
table_name="mytable",
)
print(table.count())
I get something like:
mysql://xxxxx:[email protected]/xxxxxxxxxxxx
INFO:database:[MySQL] Starting a threadpool, size=1.
DEBUG:google.auth._default:Checking None for explicit credentials as part of auth process...
DEBUG:google.auth._default:Checking Cloud SDK credentials as part of auth process...
DEBUG:database:Running SQL (MySQL): SELECT count(*) FROM `mytable`
CRITICAL:concurrent.futures:Exception in initializer:
Traceback (most recent call last):
File "/home/xxxxxxxx", line 54, in create_connection
return mysql.connect(charset="utf8", use_unicode=True, **self._args)
File "/home/xxxxxxxx/.pyenv/versions/data-diff-3.10.6/lib/python3.10/site-packages/mysql/connector/pooling.py", line 286, in connect
return CMySQLConnection(*args, **kwargs)
File "/home/xxxxxxxxxx/.pyenv/versions/data-diff-3.10.6/lib/python3.10/site-packages/mysql/connector/connection_cext.py", line 101, in __init__
self.connect(**kwargs)
File "/home/xxxxxxxxxx/.pyenv/versions/data-diff-3.10.6/lib/python3.10/site-packages/mysql/connector/abstracts.py", line 1112, in connect
self._post_connection()
File "/home/xxxxxxxxxx/.pyenv/versions/data-diff-3.10.6/lib/python3.10/site-packages/mysql/connector/abstracts.py", line 1084, in _post_connection
self.set_charset_collation(self._charset_id)
File "/home/xxxxxxxxxx/.pyenv/versions/data-diff-3.10.6/lib/python3.10/site-packages/mysql/connector/abstracts.py", line 1022, in set_charset_collation
) = CharacterSet.get_charset_info(charset)
File "/home/xxxxxxxxxx/.pyenv/versions/data-diff-3.10.6/lib/python3.10/site-packages/mysql/connector/constants.py", line 775, in get_charset_info
info = cls.get_default_collation(charset)
File "/home/xxxxxxxxxx/.pyenv/versions/data-diff-3.10.6/lib/python3.10/site-packages/mysql/connector/constants.py", line 746, in get_default_collation
raise ProgrammingError(f"Character set '{charset}' unsupported")
mysql.connector.errors.ProgrammingError: Character set '255' unsupported
Things that worked for me:
- Downgrading to
mysql-connector-python==8.0.29
- Removing
charset="utf8"
from https://github.com/datafold/data-diff/blob/master/data_diff/databases/mysql.py#L54
I'm a bit lost, since I cannot find a way to open an issue with mysql-connector-python, neither am I sure it is their issue, I'm not a developer and I find it kind of hard to navigate the code. @pawandubey links to a comment about it but I'm not sure how to proceed.
Is there something we can do?