ethereum-etl icon indicating copy to clipboard operation
ethereum-etl copied to clipboard

Filter out ASCII characters not supported by BigQuery

Open medvedev1088 opened this issue 5 years ago • 3 comments

BigQuery fails when trying to load a CSV with ASCII 0 with the following message:

Error: Bad character (ASCII 0) encountered.

We need to check what other characters are not supported in BigQuery and filter them out https://en.wikipedia.org/wiki/ASCII.

https://github.com/medvedev1088/ethereum-etl/blob/master/ethereumetl/jobs/export_tokens_job.py#L64

This should probably a separate python script with filtering logic (not in export_tokens_job.py).

medvedev1088 avatar Aug 30 '18 08:08 medvedev1088

Could this perhaps be an individual function inside the ethereumetl/utils.py file?

For example a clean_user_provided_content(content) function which could be used by the export_tokens_job.py (and other scripts) via from ethereumetl.utils import clean_user_provided_content.

tpmccallum avatar Mar 10 '19 11:03 tpmccallum

Alternatively it might be cleaner to use the pre-existing library called Unidecode [1]. This way any .py file in the ETL application can just clean up the strings by importing Unidecode like this from unidecode import unidecode and then using inline code like this clean_content = unidecode(str(dirty_content))

The only catch is that we will need to add pip3 install Unidecode to the installer ;-)

[1] https://pypi.org/project/Unidecode/

tpmccallum avatar Mar 10 '19 11:03 tpmccallum

BigQuery fails when trying to load a CSV with ASCII 0 with the following message:

Error: Bad character (ASCII 0) encountered.

We need to check what other characters are not supported in BigQuery and filter them out https://en.wikipedia.org/wiki/ASCII.

https://github.com/medvedev1088/ethereum-etl/blob/master/ethereumetl/jobs/export_tokens_job.py#L64

This should probably a separate python script with filtering logic (not in export_tokens_job.py).

Mister Medvedev, thank's a lot for the great decision (https://github.com/blockchain-etl/ethereum-etl/blob/develop/ethereumetl/jobs/export_tokens_job.py#L64). You saved my life. At least this evening.

DmitryShvetsov avatar Aug 23 '21 19:08 DmitryShvetsov