great-tables icon indicating copy to clipboard operation
great-tables copied to clipboard

Unicode characters in column headers break when exporting to png

Open mrapacz opened this issue 3 weeks ago • 0 comments

Prework

Description

When using GT.save() to save a table to a PNG file, Unicode characters in the column headers are not rendered correctly.

Reproducible example

I cloned the repo and installed it with pip install -e .[all]. Then I ran the following code:

import pandas as pd
from great_tables import GT

df = pd.DataFrame({"Żaba": ["1", "2"], "Koń": ["3", "4"]})
GT(df).save("output.png")

The output file output.png renders the column headers as "Ĺ»aba" and "KoĹ„" instead of the expected "Żaba" and "Koń".

Expected result

The Unicode characters in the column headers should be rendered correctly. Expected: Image Actual: Image

Development environment

  • Operating System: macOS Sequoia 15.6.1
  • great_tables Version: Tested on the current main branch: a59301b. Also present in v0.20.0.

Additional context

I did some extra digging to try to understand what the root cause of the problem is:

  1. The as_raw_html() call in the save() function link is not passed an explicit make_page parameter, which means the function runs with make_page=False link.

  2. This, in turn means the table is rendered as div (inline) element only.

  3. This html is later saved to a temporary file link and opened using a webdriver link.

  4. The webdriver takes a screenshot link and thus generates the png.

The default webdriver is chrome - link.

What's happening is chrome (or, specifically, this library: compact_enc_det) doesn't recognize the right charset to use and thus breaks the characters, setting the document.characterSet to windows-1250 instead of utf-8.

Workarounds / Fixes

There are a couple of possible solutions/workarounds. All of them boil down to choosing a different web driver, letting chrome know what charset we want explicitly or changing the html structure so that chrome infers the right charset.

The ones I've identified:

  1. Prepend charset metatag (<meta charset='utf-8'>) to the rendered html div (make_page is still False here). This is enough for Chrome to infer the character set correctly.
html_content = "<meta charset='utf-8'>" + html_content
  1. Pass make_page=True - this works, because the rendered page contains the charset definition (set here: link). The pngs I tried rendering using this flow seemed to be the same as the ones on the current main branch, but it might be more invasive fix than (1).
html_content = as_raw_html(self, make_page=True)
  1. Use a different webdriver - Firefox, for one, recognizes the charset correctly. Users can always use a different web driver or the default web driver can be updated. (former sounds like bad UX, latter seems unwarranted given the obscurity of this issue)
(
    GT(df)
    .save(file="output.png", web_driver="firefox")
)
  1. Use ASCII-based column names and relabel them: In the breaking scenario, chrome needs to interpret this:
<th id="Żaba">Żaba</th>

However, when doing this:

df_safe = df.rename(columns={"Żaba": "Zaba", "Koń": "Kon"})
GT(df_safe).cols_label(Zaba="Żaba", Kon="Koń").save("output.png")

the html ends up being:

<th id="Zaba">Żaba</th>

which is interpreted by Chrome correctly.

I'd be more than happy to contribute a PR, even if it's a one-liner fix. Thanks a lot for the great great_tables!

mrapacz avatar Nov 07 '25 18:11 mrapacz