dbt-mysql
dbt-mysql copied to clipboard
Clarification: Data Type display on docs serve using mysql adapter
Describe the bug
Apologies, we do not know if this is a bug yet. Hence like to check with the rest of users who may be able to help. After docs generate and serve, we see that the data type are displayed with b'xxxx'. Please see screenshot below:
We do see the same in the generated target/catalog.json file. Not sure if this is adapter specific or configuration issue but we searched and could not find a solution.
Appreciate some guidance on this. Thank you!
Steps To Reproduce
In as much detail as possible, please provide steps to reproduce the issue. Sample data that triggers the issue, example model code, etc is all very helpful here.
Expected behavior
A clear and concise description of what you expected to happen.
Screenshots and log output
If applicable, add screenshots or log output to help explain your problem.
The output of dbt --version
:
The operating system you're using:
Ubuntu 22.04.1 LTS
The output of python --version
:
3.9.13
Additional context
Add any other context about the problem here.
@clementchong thanks for opening this issue!
It sounds like target/catalog.json
file contains a Python bytes literal rather than a str
literal (see here and here for further explanation).
Basically, it has encoded data that needs to be decoded in order to render properly.
Which database and database version are you using (e.g., MySQL 8.0.32
)?
Do you know what your "Connection Character Set and Collation System Variable" settings are?
- character_set_results
- character_set_server
- collation_server
- character_set_database
- collation_database
character_set_results
is "the character set used for returning query results to the client. This includes result data such as column values, result metadata such as column names, and error messages." I'm wondering if your setting might explain the bytes encoding like b'varchar(8)'
, etc.
@dbeatty10 Thank you for the quick and detailed overview. Our MySQL version is 8.0.25. The variable values are as below:
Do you see anything wrong? Or could someone share their versions if not facing such display issue?
I found the below discussion, would try out.
https://stackoverflow.com/questions/40921341/python-mysql-connector-returns-bytearray-instead-of-regular-string-value
I also found the information_schema.columns table having binary collation for the data_type field. However, I am unable to change it.
I also ran into this issue and it appears to be targeted at MySQL 8.0. The data_type
column's collation in information_schema.columns
is binary in this version, while it is non-binary in MySQL 5.7 and MariaDB 10.5. When a column is defined with binary collation, the Python MySQL connector returns the column value as a byte.
The collation settings for each database versions are as follows:
DATABASE | DATA_TYPE | CHARACTER_SET_NAME | COLLATION_NAME |
---|---|---|---|
mysql:5.7 | varchar | utf8 | utf8_general_ci |
mysql:8.0 | longtext | utf8mb3 | utf8mb3_bin |
mariadb:10.5 | varchar | utf8 | utf8_general_ci |
The different collation settings for each database version can be observed by running the following commands and changing the DOCKER_IMAGE environment variables:
$ DOCKER_IMAGE=mysql:8.0; export DOCKER_IMAGE
$ read -d '' QUERY_TEXT << EOF
select column_name,
data_type,
character_set_name,
collation_name
from information_schema.columns
where table_schema = 'information_schema'
and table_name = 'COLUMNS'
and column_name = 'DATA_TYPE';
EOF
$ docker run --name col-data-type-collation-check -e MYSQL_ALLOW_EMPTY_PASSWORD=yes -d ${DOCKER_IMAGE}
# Wait for database to start up then:
$ docker exec col-data-type-collation-check mysql -e "${QUERY_TEXT}"
COLUMN_NAME DATA_TYPE CHARACTER_SET_NAME COLLATION_NAME
DATA_TYPE longtext utf8mb3 utf8mb3_bin
# Tidy up
$ docker stop col-data-type-collation-check && docker rm col-data-type-collation-check && docker image rm ${DOCKER_IMAGE}
A proposed fix is to remove the b character prefix from dtype in the MySQLAdapter parse_show_columns function as follows:
import re
...
MySQLColumn(
table_database=None,
table_schema=relation.schema,
table_name=relation.name,
table_type=relation.type,
table_owner=None,
table_stats=None,
column=column.column,
column_index=idx,
# dtype=column.dtype,
dtype = re.sub(r"^b'|'$", "", column.dtype)
)
Dear Jeff
Thank you for the quick explanation and proposed fix. It is very helpful!
Regards Clement
On Sun, May 7, 2023, 00:32 Jeff Moszuti @.***> wrote:
I also ran into this issue and it appears to be targeted at MySQL 8.0. The data_type column's collation in information_schema.columns is binary in this version, while it is non-binary in MySQL 5.7 and MariaDB 10.5. When a column is defined with binary collation, the Python MySQL connector returns the column value as a byte.
The collation settings for each database versions are as follows: DATABASE DATA_TYPE CHARACTER_SET_NAME COLLATION_NAME mysql:5.7 varchar utf8 utf8_general_ci mysql:8.0 longtext utf8mb3 utf8mb3_bin mariadb:10.5 varchar utf8 utf8_general_ci
The different collation settings for each database version can be observed by running the following commands and changing the DOCKER_IMAGE environment variables:
$ DOCKER_IMAGE=mysql:8.0; export DOCKER_IMAGE $ read -d '' QUERY_TEXT << EOF select column_name, data_type, character_set_name, collation_name from information_schema.columns where table_schema = 'information_schema' and table_name = 'COLUMNS' and column_name = 'DATA_TYPE'; EOF
$ docker run --name col-data-type-collation-check -e MYSQL_ALLOW_EMPTY_PASSWORD=yes -d ${DOCKER_IMAGE}
Wait for database to start up then:
$ docker exec col-data-type-collation-check mysql -e "${QUERY_TEXT}" COLUMN_NAME DATA_TYPE CHARACTER_SET_NAME COLLATION_NAME DATA_TYPE longtext utf8mb3 utf8mb3_bin
Tidy up
$ docker stop col-data-type-collation-check && docker rm col-data-type-collation-check && docker image rm ${DOCKER_IMAGE}
A proposed fix is to remove the b character prefix from dtype in the MySQLAdapter parse_show_columns https://github.com/dbeatty10/dbt-mysql/blob/main/dbt/adapters/mysql/impl.py#L97-L113 function as follows:
import re ... MySQLColumn( table_database=None, table_schema=relation.schema, table_name=relation.name, table_type=relation.type, table_owner=None, table_stats=None, column=column.column, column_index=idx, # dtype=column.dtype, dtype = re.sub(r"^b'|'$", "", column.dtype) )
— Reply to this email directly, view it on GitHub https://github.com/dbeatty10/dbt-mysql/issues/108#issuecomment-1537177062, or unsubscribe https://github.com/notifications/unsubscribe-auth/AX6SX27QPBLV5PBDRFYB723XEZ4JZANCNFSM6AAAAAAQAILRU4 . You are receiving this because you were mentioned.Message ID: @.***>