python-package-guide
python-package-guide copied to clipboard
enh: create a graphic or visual that helps us visualize % translated for each section of the guide by language
One of the benefits of using a tool like crowin is the bar plots that quickly help a user understand how far along translations are in a specific language.
We can generate a graphic like this using Python and post it in our translation file and README file for a quick overview of the translation status. Below is a somewhat ugly, messy version of this using Babel. But it starts to get at what i'm thinking about!
If someone is interested in a python project, you could work on this and
- [x] Turn it into a runnable script with a main() function.
- [ ] Fix the plots to make them look nicer using perhaps a different plotting tool
- [ ] Create subplots one for each language
- [ ] Save the plot in the repository directory so we can then add it to our README and other files.
- [ ] Document everything!
from pathlib import Path
import os
from babel.messages import pofile
import matplotlib.pyplot as plt
# Path to your locales directory
# Go up one directory and locate the locales folder (i'm thinking this will live in a /scripts directory at the root of the repo
BASE_DIR = Path(__file__).resolve().parent.parent
LOCALES_DIR = BASE_DIR / "locales"
print(LOCALES_DIR)
def calculate_translation_percentage(po_path):
with open(po_path, "r", encoding="utf-8") as f:
catalog = pofile.read_po(f)
total = len(catalog)
translated = sum(1 for message in catalog if message.string)
percent = (translated / total * 100) if total > 0 else 0
return round(percent, 1)
def get_translation_progress(locales_dir):
progress = {}
for lang_dir in locales_dir.iterdir():
# skip os stuff like .DS_Store
if not lang_dir.is_dir():
continue #
lang = lang_dir.name
lc_messages_dir = lang_dir / "LC_MESSAGES"
if not lc_messages_dir.exists():
continue
po_files = lc_messages_dir.glob("*.po")
for po_file in po_files:
percent = calculate_translation_percentage(po_file)
key = f"{lang}/{po_file.stem}"
progress[key] = percent
return progress
# Get progress data and plot
progress = get_translation_progress(LOCALES_DIR)
print(progress)
langs = list(progress.keys())
percents = list(progress.values())
print(percents)
plt.figure(figsize=(10, 6))
plt.barh(langs, percents)
plt.xlabel("Translation %")
plt.title("Translation progress by language")
plt.xlim(0, 100)
plt.grid(axis="x", linestyle="--", alpha=0.5)
plt.tight_layout()
plt.show()
The follow-up issue associated with this would be to add CI to run this automatically each week. I think that can be a sub-issue that we can make once this issue is complete!
Looking into this one as part of PyCon US sprints -- @lwasser @flpm
Added you to the issue @RobPasMue !! Thank you!
Awesome thanks - I will break this down into:
- [x] Creating a script that goes through the
*.pofiles and gets the stats - see https://github.com/pyOpenSci/python-package-guide/pull/495 - [x] Reading the stats as part of our docs and displaying them in the docs - see https://github.com/pyOpenSci/python-package-guide/pull/511
- [x] Setting up a workflow to update the data (i.e.
*.jsonfile) on a scheduled way - see https://github.com/pyOpenSci/python-package-guide/pull/521 - [ ] Document everything! =)
I just wanna make sure people know this is already something sphinx does and we don't need to write this ourselves!!
https://sphinx-intl.readthedocs.io/en/master/refs.html#sphinx-intl-stat
https://www.sphinx-doc.org/en/master/usage/advanced/intl.html#translation-progress-and-statistics
I knew about intl-stat, and we talked about it in the sprint, but since the goal was to use the data to do some sort of chart or viz summary showing the status and areas that need help, it think it is cleaner to extract from the PO files directly instead of parsing the output of sphinx-intl.
I did not know about the second link! If I understand correctly, is to highlight inside the translation pages the text that is not translated yet. I imagine that this is what you were thinking for the CSS that shows those parts in a different way when we were discussing that other PR, right? I think that is definitely still worth investigating, but maybe that should be a separate issue.
totally - use whatever is useful, just wanted to make sure ppl were aware because the docs are (ironically) a little hard to navigate :)
I just merged the PR that has the script and associated JSON data that we can use to create a plot of translation status by section. The ability to highlight untranslated text is super interesting!! Thank you for those resources @sneakers-the-rat !!!
Now that #495 has been merged I will move on to the visualization of the data =) I'll try to open a PR shortly! Any ideas are welcome.
Displaying stats in our docs is now in a PR =) - see #511 for more details!
We appreciate you @RobPasMue thank you for all of the work on this!!
While we are almost closing #511, I also started #521 to be able to update periodically (and on demand) the translation_stats.json file =)
Ok I think this one has been complete so i'm closing it!! Please reopen if you all disagree. Hope to see you this year at PyCon or SciPY @RobPasMue and how that all is well overseas!! ✨
Thank you @lwasser and I hope to be able to attend and see you there too!!