heatpumps icon indicating copy to clipboard operation
heatpumps copied to clipboard

feat: Start adding multi-language support

Open zarch opened this issue 1 year ago • 6 comments

PR that tried to fix #7.

The main changes are:

  • Start introducing 'gettext' on dashboard and variables to traslate the name expose on the streamlit interface,
  • Add also files to translate from english to german,
  • Remove line on '.gitignore' file to not ignore translation files

The current state is not fully functional, in particular I had issues with st.subheader like for instance:

st.subheader("Log(p)-h Diagram")

If I change them to:

st.subheader(_("Log(p)-h Diagram"))

When I execute the code I get:

Traceback (most recent call last):
  File "/data/src/zheatpumps/.venv/lib/python3.12/site-packages/streamlit/runtime/scriptrunner/exec_code.py", line 88, in exec_func_with_error_handling
    result = func()
             ^^^^^^
  File "/data/src/zheatpumps/.venv/lib/python3.12/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 590, in code_to_exec
    exec(code, module.__dict__)
  File "/data/src/zheatpumps/src/heatpumps/hp_dashboard.py", line 1025, in <module>
    st.subheader(_("Log(p)-h Diagram"))
                 ^^^^^^^^^^^^^^^^^^^^^
TypeError: 'DeltaGenerator' object is not callable

That I don't know exactly how to fix.

I tried to build a minimal example:

import pathlib
import gettext
import streamlit as st

PRJDIR = pathlib.Path(__file__).parent.absolute()
LOCDIR = PRJDIR / "locales"
STADIR = PRJDIR / "src" / "heatpumps" / "static"
IMGDIR = STADIR / "img"

print(f"{LOCDIR=}")
print(f"{STADIR=}")
print(f"{IMGDIR=}")

# Initialize gettext
lang = gettext.translation("dashboard", localedir=LOCDIR, languages=["de"])
lang.install()
_ = lang.gettext

print(f"Translate `T-s Diagram`: {_("T-s Diagram")}")
print(f"Translate `Log(p)-h Diagram`: {_("Log(p)-h Diagram")}")

# Streamlit code
st.set_page_config(
    layout="wide",
    page_title=_("Heat Pump Dashboard - test translation"),
    page_icon=(IMGDIR / "page_icon_ZNES.png").as_posix(),
)

with st.sidebar:
    st.subheader(_("T-s Diagram"))
    st.subheader(_("Log(p)-h Diagram"))

But if I run the code it works:

❯ uv run streamlit run loc.py

  You can now view your Streamlit app in your browser.

  Local URL: http://localhost:8501
  Network URL: http://192.168.1.35:8501

LOCDIR=PosixPath('/data/src/syn/tespy_test/heatpumps/locales')
STADIR=PosixPath('/data/src/syn/tespy_test/heatpumps/src/heatpumps/static')
IMGDIR=PosixPath('/data/src/syn/tespy_test/heatpumps/src/heatpumps/static/img')
Translate `T-s Diagram`: T-s Diagramm
Translate `Log(p)-h Diagram`: Log(p)-h Diagramm
^[^C  Stopping...

I'm not a streamlit expert do you have any idea suggestion on how should I fix this?

zarch avatar Oct 10 '24 08:10 zarch

Hi @zarch,

interesting approach you suggest (I have just seen this as I opened my PR, so I thought I'd jump on it).

I have read a little bit about the streamlit and gettext approaches (https://readmedium.com/how-to-build-a-multi-language-dashboard-with-streamlit-9bc087dd4243). What I find a little bit strange, is the fact that apparently you have to regenerate the files in case the code changes? Or does that happen automatically?

Best

Francesco

fwitte avatar Oct 15 '24 17:10 fwitte

Hi @fwitte ,

What I find a little bit strange, is the fact that apparently you have to regenerate the files in case the code changes? Or does that happen automatically?

I'm not an expert on translation, but from other projects that I've seen, every time that a message msgid is changed the translation should be changed too, otherwise it will display the original msgid string. The msgid is the string contains in _("<msgid>") within the python code.

So to answer to your question, it is possible to automatically update/generate the template using a tool and command like:

$ pygettext3.12 -d base -o locales/dashboard.pot src/heatpumps/hp_dashboard.py

However, for each msgid that has been changed in the code you also need to manually update the files in locales/{lang-code}/LC_MESSAGES/*.po and then build the binary used by gettext with:

$ msgfmt -o locales/de/LC_MESSAGES/dashboard.mo locales/de/LC_MESSAGES/dashboard.po

To me this process/check should be performed before a release, I do not know if there are any github actions that perform these kind of checks automatically. Having a short look preparing this answer I see potranslator that automatically translate using google translator the msgid but I do not know if the output is good enough and I do not see github actions ready to use that can automatically triggered and generate / update all the related files.

A project that I follow grass use weblate that, in case of grass is self-hosted on OSGeo. From their website I see that for opensource project it is possible to apply for free.

zarch avatar Oct 16 '24 07:10 zarch

Hey @zarch and @fwitte,

thank you for your proposals regarding this issue. We are very glad to welcome anyone who wants to help build great software. My first instinct to tackle this issue were dictionaries, but you are right that gettext seems to be the recommended tool. To me, the necessary workflow seems a bit more unintuitive and therefore possibly a hurdle for external contributors, if it is not documented thoroughly and straight forward. This holds especially true, if there is no automation like GitHub Actions to automate the building of the binaries and so forth. Furthermore, I think we should refrain from using automatic translations and rather do those by hand, as we deal with technical language, which for now seems to be a mixed bag in the quality of the translations. If that reliably improves at some point, I would not object to using automation any more.

Two hints regarding the issues with your minimal example and the PR:

  1. A quick search of the error you are getting (TypeError: 'DeltaGenerator' object is not callable) seems to lead to a streamlit object being called, i.e. is followed by a pair of paratheses, that is not intended to be called. This example from the official forum has a streamlit.column object being called like a function, when it should not.
  2. The gettext module can not and will not handle f-Strings, as those can not be evaluated before runtime. This issue I found talks about that and gives the alternative to first let gettext translate the part without the formatted string and fill it in afterwards (e.g. _("my string {}").format(variable)). This is a bit annoying, as we like using f-strings and they are therefore found quite often inside the dashboard codebase.

Best regards, Jonas

jfreissmann avatar Oct 16 '24 10:10 jfreissmann

Hi @jfreissmann ,

Thank you for your feedback.

My first instinct to tackle this issue were dictionaries, but you are right that gettext seems to be the recommended tool.

I'm afraid that using a custom dictionary mechanism will not improve a lot. You still need to:

  • extract from the code or specify from the code the string that can be translated, similar on what gettext is doing with _();
  • make the translation persistent in a python or json file;
  • make a mechanism that update the keys;
  • and you still have to keep the translation file update once on a while.

More or less is what gettext is doing already. Using a "standard" has a plus that contributors can used other tools to support the translation e.g. poedit or the previous mentioned weblate.

To me, the necessary workflow seems a bit more unintuitive and therefore possibly a hurdle for external contributors, if it is not documented thoroughly and straight forward. This holds especially true, if there is no automation like GitHub Actions to automate the building of the binaries and so forth.

I agree, the full workflow is not clear to me too :rofl: From my quick research I did not found a Github Actions that is ready to be used for this use case, but I believe that once the workflow is clear it should be possible to implement an Action to support on this task.

I think we should refrain from using automatic translations and rather do those by hand, as we deal with technical language, which for now seems to be a mixed bag in the quality of the translations.

It is fine for me, I just found the project and I bring it into the discussion I have no strong opinion on that.

A quick search of the error you are getting (TypeError: 'DeltaGenerator' object is not callable) seems to lead to a streamlit object being called, i.e. is followed by a pair of paratheses, that is not intended to be called. This example from the official forum has a streamlit.column object being called like a function, when it should not.

I saw the discussion but I was not able to understand how to apply on our case. Especially because I tried to build a minimum reproducible examples and it works... So I'm not sure that I properly understood where the problem is.

The gettext module can not and will not handle f-Strings, as those can not be evaluated before runtime. https://github.com/QubesOS/qubes-issues/issues/7824 talks about that and gives the alternative to first let gettext translate the part without the formatted string and fill it in afterwards (e.g. _("my string {}").format(variable)). This is a bit annoying, as we like using f-strings and they are therefore found quite often inside the dashboard codebase.

I'm aware of this and the current version is already dealing / removing / splitting the text to avoid this kind of issues. :smile:

zarch avatar Oct 16 '24 11:10 zarch

I'm afraid that using a custom dictionary mechanism will not improve a lot. You still need to:

extract from the code or specify from the code the string that can be translated, similar on what gettext is doing with _();
make the translation persistent in a python or json file;
make a mechanism that update the keys;
and you still have to keep the translation file update once on a while.

More or less is what gettext is doing already.

Full agreeance.

I agree, the full workflow is not clear to me too 🤣 From my quick research I did not found a Github Actions that is ready to be used for this use case, but I believe that once the workflow is clear it should be possible to implement an Action to support on this task.

I think, a well documented workflow is necessary, but also sufficient to make the decision to move on and use gettext.

I'm aware of this and the current version is already dealing / removing / splitting the text to avoid this kind of issues. 😄

Sorry, i must have missed that between all the changes. My bad!

jfreissmann avatar Oct 17 '24 06:10 jfreissmann

Meanwhile, I think, I found the cause of the problem your PR is facing. The error occured in line 1025, which is preceeded by the creation of multiple columns (your line 1015), some of which are only used as padding and were therefore not needed in the further code. In accordance with best practices, we used the underscore _ character to signal to others, that these columns are indeed unused. This usage overwrites the gettext underscore method and the error occurs, as the column object is not callable.

# %% State Diagrams
col_left, _, col_right = st.columns([0.495, 0.01, 0.495])
_, slider_left, _, slider_right, _ = st.columns([0.5, 8, 1, 8, 0.5])

if is_dark:
    state_diagram_style = "dark"
else:
    state_diagram_style = "light"

with col_left:
    # %% Log(p)-h-Diagram
    st.subheader(_("Log(p)-h Diagram"))

We could either rename these unused columns. On the other hand, streamlit now allows to pass a keyword argument to chose between a small, medium or large gap between columns. Maybe this option makes the use of padding columns obsolete.

jfreissmann avatar Oct 17 '24 06:10 jfreissmann