redash icon indicating copy to clipboard operation
redash copied to clipboard

Select all queries throwing errors.

Open lovelady opened this issue 8 years ago • 21 comments

Looks like all my queries work but just noticed that "select * from any_table" results in the following error:

Error running query: 'utf8' codec can't decode bytes in position 0-1: invalid continuation byte

Not preventing me from using the product but I like to do select * queries on some tables while I build my queries.

lovelady avatar Jan 09 '17 08:01 lovelady

I don't think it's an issue with SELECT *, but with the data it returns. What data source type are you using?

arikfr avatar Jan 09 '17 08:01 arikfr

MySQL innodb. Thanks for being so prompt! :)

lovelady avatar Jan 09 '17 08:01 lovelady

Not aware of specific issues with utf8 in MySQL, but can you check your server logs for a stacktrace to see where this error comes from?

arikfr avatar Jan 09 '17 08:01 arikfr

Ok...Just running some test queries. Its pretty clear that it's the hash columns that I have stored in my bigger tables. It appears it doesn't like all those crazy characters.

lovelady avatar Jan 09 '17 08:01 lovelady

Btw, Redash doesn't apply any limits on the queries. So if you have huge tables, it will try to return all the rows.

arikfr avatar Jan 09 '17 08:01 arikfr

yeah, these aren't big tables (yet) and I wouldn't normally run these queries....I was just trying to get a look at all the columns when I found the error.

lovelady avatar Jan 09 '17 08:01 lovelady

👍

arikfr avatar Jan 09 '17 08:01 arikfr

I'm not too worried about this but any thoughts on a fix in the near or far future?

lovelady avatar Jan 09 '17 08:01 lovelady

Need to see the log to understand where this comes from.

arikfr avatar Jan 09 '17 08:01 arikfr

which log?

lovelady avatar Jan 09 '17 08:01 lovelady

How did you setup Redash?

arikfr avatar Jan 09 '17 08:01 arikfr

on AWS using one of you AMI's.

lovelady avatar Jan 09 '17 08:01 lovelady

Cool, then the log is at: /opt/redash/logs/celery_error.log.

arikfr avatar Jan 09 '17 08:01 arikfr

celery.log.zip

Here ya go. Thanks!

lovelady avatar Jan 09 '17 10:01 lovelady

I can reproduce this when a column has BINARY type in MySQL. (We store uuid4 string as binary to reduce data size)

We have a MySQL function to convert it to normal uuid4, so we don't have a problem though.

rdtr avatar Sep 01 '17 18:09 rdtr

Also reproducible on SQL Server Geography types.

alexsorokoletov avatar Jun 12 '18 09:06 alexsorokoletov

Looks like it's related to how pymssql handles binary/SQLGeography types in SQL Server. A Workaround is to convert binary to something else, or convert SQLGeography column to text by calling column.STAsText() https://docs.microsoft.com/en-us/sql/t-sql/spatial-geography/stastext-geography-data-type?view=sql-server-2017

Wondering if redash can do that manually when detecting binary column and show it in HEX, for example.

alexsorokoletov avatar Jun 13 '18 15:06 alexsorokoletov

I was able to monkeypatch this. https://github.com/getredash/redash/compare/master...dodopizza:fix-mysql-binary-columns

We store UUIDs in MySQL as BINARY(16) columns. They are everywhere in all our databases. When we query select * from orders in redash, it fails with UnicodeDecodeError: 'utf8' codec can't decode byte 0xb0 in position 4: invalid start byte. It means that json serialization fails because it tries to encode mysql result as utf8 string, which it is not, just UUID bytes. This is happening because MySQLdb treats BINARY as bytes, which is just a synonym of str in Python2. I think in Python 3 it wouldn't be a problem, because it wouldn't be unicode The problematic line is https://github.com/PyMySQL/mysqlclient-python/blob/17045e8ed04579ec2d80e9daa9c7b69fbfcd5c2f/MySQLdb/converters.py#L88 _bytes_or_str = ((FLAG.BINARY, bytes), (None, unicode)) I monkeypatched this line to _bytes_or_str = ((FLAG.BINARY, buffer), (None, unicode)) to resemble the PostgreSQL behavior: http://initd.org/psycopg/docs/usage.html#adaptation-of-python-values-to-sql-types Psycopg treats bytea(which is BINARY) as buffer. Redash already has hexlify for buffer type: https://github.com/getredash/redash/blob/ef9a4d5eed82605ad672b4717c75c87c4b8609fa/redash/utils/init.py#L104-L105 With this monkey patching select * from orders works and it shows our UUIDs as hexes. So, what do you think? Should we report the problem with buffer in python2 to MySQLDb? Or just monkeypatch it in redash?

spacentropy avatar Aug 27 '19 09:08 spacentropy

Thank you for looking into this, @spacentropy.

The monkey patching solution looks reasonable. But I wonder if we can do this on our end, by checking either the Cursor's description or description_flags properties to check if some column is a binary?

Regardless it's worth opening an issue on the mysql-client repo to get their feedback as well (but move forward with our own solution).

arikfr avatar Aug 27 '19 19:08 arikfr

@arikfr BINARY columns now work fine in MySQL as of 9.0.0-beta. This must be result of porting to Python3 and having proper unicode/bytes types.

Thanks!

spacentropy avatar Jun 13 '20 21:06 spacentropy

Similar issues still happens - I installed redash on a local VM running Ubuntu 18. Installation with the model script. Select * FROM ... a 2014 SQL server results to similar error. "Error running query: 'utf8' codec can't decode bytes in position 4-5: invalid continuation byte"

I have not monkeypatched, yet.

gauiPPP avatar Oct 17 '22 07:10 gauiPPP