superset icon indicating copy to clipboard operation
superset copied to clipboard

Superset 4.0.1 - Exported CSV Contains Garbled Chinese Characters and Numbers

Open glenn-gj-liang opened this issue 1 year ago • 5 comments

Bug description

I have installed the latest version of Superset (4.0.1). However, when exporting query results to a CSV file, the Chinese characters and numbers in the file are always garbled. I have tried changing the encoding to utf-8, utf-8-sig, and gbk, but the issue persists.

How to reproduce the bug

Install Superset 4.0.1. Run a query that includes Chinese characters and numbers. Export the query results to a CSV file. Open the CSV file and observe that the Chinese characters and numbers are garbled.

Screenshots/recordings

No response

Superset version

4.0.1

Python version

3.11

Node version

20.12.2

Browser

Chrome

Additional context

Expected Behavior: The Chinese characters and numbers should be displayed correctly in the exported CSV file.

Actual Behavior: The Chinese characters and numbers are garbled in the exported CSV file.

Environment:

Superset version: 4.0.1 Encoding attempted: utf-8, utf-8-sig, gbk Additional Context: Please provide any additional information that might help in diagnosing the

Checklist

  • [X] I have searched Superset docs and Slack and didn't find a solution to my problem.
  • [X] I have searched the GitHub issue tracker and didn't find a similar bug report.
  • [X] I have checked Superset's logs for errors and if I found a relevant Python stacktrace, I included it here as text in the "additional context" section.

glenn-gj-liang avatar Jun 12 '24 07:06 glenn-gj-liang

Hey @TommyLeung-gj! I'm here to help you with the issue you're experiencing with Superset 4.0.1. I'm a bot that can assist you with bug fixes, answer your questions, and guide you on becoming a contributor. Let's dive into resolving the problem together!

To resolve the issue of Chinese characters and numbers being garbled when exporting query results to a CSV file in Superset 4.0.1, you need to ensure that the CSV file is encoded in UTF-8. The df_to_escaped_csv function in the superset/utils/csv.py file should be called with the appropriate encoding settings.

Here is the relevant part of the code with the necessary modification:

csv_data = df_to_escaped_csv(df, index=False, encoding='utf-8', **config["CSV_EXPORT"])

Ensure that the encoding='utf-8' parameter is included in the call to df_to_escaped_csv. This will help in correctly encoding Chinese characters and numbers in the exported CSV file [1][2].

To continue talking to Dosu, mention @dosu.

dosubot[bot] avatar Jun 12 '24 09:06 dosubot[bot]

Related issue linked above ^^^

Might even be considered a duplicate, really, but they seem kind of unique.

rusackas avatar Jun 12 '24 17:06 rusackas

No ,It still doesn't work after I changed encoding = "utf-8"

glenn-gj-liang avatar Jun 17 '24 07:06 glenn-gj-liang

maybe there is something wrong in the process of async query , the celery or redis resultbackend cache

glenn-gj-liang avatar Jun 17 '24 07:06 glenn-gj-liang

+1

bionexit avatar Jun 24 '24 14:06 bionexit

+2

Having the same issue with VERSION 4.0.1 even VERSION 2.1.3 WORKING FINE WITH THESE PARAMETER CONFIG.

Habeeb556 avatar Jun 30 '24 07:06 Habeeb556

The issue has been resolved by downgrading the package with the following command: pip install Werkzeug==2.3.8.

Habeeb556 avatar Jun 30 '24 09:06 Habeeb556

The issue has been resolved by downgrading the package with the following command: pip install Werkzeug==2.3.8.

I down grade the Werkzeug as 2.3.8 but no luck. What's your encoding option?

My is following

CSV_EXPORT = {"encoding": "utf-8-sig"}

bionexit avatar Jul 01 '24 03:07 bionexit

The issue has been resolved by downgrading the package with the following command: pip install Werkzeug==2.3.8.

I down grade the Werkzeug as 2.3.8 but no luck. What's your encoding option?

My is following

CSV_EXPORT = {"encoding": "utf-8-sig"}

Yes, this encoding. But did you face the same problem with VERSION 2.1.3 or 3.1.3? Also I got a Chinese characters not with the English.

Habeeb556 avatar Jul 01 '24 05:07 Habeeb556

The issue has been resolved by downgrading the package with the following command: pip install Werkzeug==2.3.8.

I down grade the Werkzeug as 2.3.8 but no luck. What's your encoding option? My is following

CSV_EXPORT = {"encoding": "utf-8-sig"}

Yes, this encoding. But did you face the same problem with VERSION 2.1.3 or 3.1.3? Also I got a Chinese characters not with the English.

It's worked after i reload the cerely service. Thanks a lot bro.

bionexit avatar Jul 01 '24 07:07 bionexit

++ @TommyLeung-gj, could you confirm if this downgrade solves your issue or not? Also, what language are you using?

++ @bionexit, we appreciate your feedback on the language characters you encountered, to report to the Werkzeug team.

Habeeb556 avatar Jul 09 '24 10:07 Habeeb556

The issue has been resolved by downgrading the package with the following command: pip install Werkzeug==2.3.8.

I down grade the Werkzeug as 2.3.8 but no luck. What's your encoding option? My is following

CSV_EXPORT = {"encoding": "utf-8-sig"}

Yes, this encoding. But did you face the same problem with VERSION 2.1.3 or 3.1.3? Also I got a Chinese characters not with the English.

It's worked after i reload the cerely service. Thanks a lot bro.

I use docker. How to reload the cerely service. thanks.

foretony5211 avatar Jul 11 '24 09:07 foretony5211

++ @TommyLeung-gj, could you confirm if this downgrade solves your issue or not? Also, what language are you using?

++ @bionexit, we appreciate your feedback on the language characters you encountered, to report to the Werkzeug team.

i've try downgraded Werkzeug==2.3.8, and it works, thanks

wuqicyber avatar Jan 20 '25 07:01 wuqicyber

@glenn-gj-liang are you by chance downloading a CSV from a Table type chart which has server side paging enabled? I found that the server side pagination caused chinese characters / garbled characters to appear in the CSV file (will report this as a bug later).

ruifpedro avatar Mar 26 '25 14:03 ruifpedro

Related, I think: https://github.com/apache/superset/pull/33720

Anyone able to reproduce thison 5.0.0 release candidates or on master branch?

rusackas avatar Jun 16 '25 22:06 rusackas