fix: encoding problem when exporting CSV
Problem
Encoding problem for CSV exports when opening the generated file in Microsoft Excel
Changes
Implemented BOM in CSV exports to resolve encoding issues with special characters on Microsoft Excel.
Introduces a Byte Order Mark (BOM) at the start of CSV files to improve the handling of special characters across various platforms. The UTF-8 BOM, a specific byte sequence (EF BB BF), is used as a signal to software that the file is encoded in UTF-8, ensuring consistent interpretation of Unicode characters, especially on systems where UTF-8 is not the default encoding.
This change aims to enhance cross-platform compatibility and prevent misinterpretation of special characters in CSV exports.
How did you test this code?
- Generate a Survey
- Add a response with some special characters. e.g. "你好吗"
- Export the Survey responses
- Open in Microsoft Excel
Fixes #19580
@liyiy could you please help with a review here?
This PR hasn't seen activity in a week! Should it be merged, closed, or further worked on? If you want to keep it open, post a comment or remove the stale label – otherwise this will be closed in another week.
@liyiy could you give some help to review here?
@pauldambra @daibhin could you please help me with a review/feedback here?
@liyiy fixed lint error on commit/PR message
@liyiy rebased with main. could you take a look please? thanks!
Hey @nykolaslima
To make this pass the mypy check (mypy -p posthog | mypy-baseline filter), you'll need to change a few lines higher to: render_context: dict = {}
I'd be happy to merge it in with the change, though in my testing locally I didn't actually see the BOM symbol inside the exported CSV files, when looking at them with a hex editor... and I'm not sure why 🤔. Did you test locally that this works?
Also, the real answer is probably to implement XLS exports 😅
@mariusandra I had to test it with microsoft excel - when importing in google sheets for example it works, but in microsoft excel it didn't
will fix what you mentioned and will let you know when its done
This PR hasn't seen activity in a week! Should it be merged, closed, or further worked on? If you want to keep it open, post a comment or remove the stale label – otherwise this will be closed in another week.
This PR hasn't seen activity in a week! Should it be merged, closed, or further worked on? If you want to keep it open, post a comment or remove the stale label – otherwise this will be closed in another week.
I would agree with @mariusandra here that implementing an option for XLS exports is a much more proper solution than introducing a bom mark to the file especially since there's no pressing urgency for a quick fix 👍 Did you want to take a go at that?
This PR hasn't seen activity in a week! Should it be merged, closed, or further worked on? If you want to keep it open, post a comment or remove the stale label – otherwise this will be closed in another week.
FYI see #20568 for xlsx support
This PR hasn't seen activity in a week! Should it be merged, closed, or further worked on? If you want to keep it open, post a comment or remove the stale label – otherwise this will be closed in another week.
This PR was closed due to lack of activity. Feel free to reopen if it's still relevant.