How to Sync Translations for New Features?
Current Issue
Raised in https://github.com/processing/p5.js-web-editor/pull/3656 by @cassiano!
When new features are added, the English text often gets merged before translations are in place. This leads to inconsistencies where:
- Some UI elements remain untranslated until someone notices and adds them later.
- Tests that depend on translation keys may fail or be harder to maintain.
Questions for Discussion
- Should new features be blocked from merging until at least the base translation keys are added?
- How do we want to handle updates when translations for all languages aren’t ready yet? For example, do we want to first merge with English placeholders and track missing translations separately? Or should werequire at least key/value stubs for all supported languages?
- Can we automate parts of this workflow (protentially utilizing https://github.com/processing/p5.js-web-editor/issues/3609)?
Hi! The only way to properly solve this would be to add translations for features before merging. But obviously that will delay the merge. And the delays will increase in the future as more languages are added.
Do we have a data on which are out most commonly used languages? We could translate our top 3 or top 5 before merging and then add the rest later.
For automation we could raise a single mega issue for missing translations and use the script to keep adding to-dos to the issue.
Hi @mhsh312. I am glad we are having this conversation, so we can come up together with new ideas about this relevant subject.
First of all, what I have in mind is a (much) simpler process: the person who makes a PR (be it a bug fix or new feature) would be primarily responsible for identifying and creating new translation/localization keys in the en-US locale only (and of course provide the proposed english texts for them), which would always be the primary source of truth regarding i18n. PRs which do not follow this basic principle/rule simply would not be accepted.
This way new merges would not be affected by missing translations in any way, while providing a base code already prepared to support new translations easily.
When code is merged without fully supporting i18n in the base locale (en-US), it gets much harder later for others to identify and extract the translation keys. This happened to me recently, when working on this PR: https://github.com/processing/p5.js-web-editor/pull/3656. I had a situation where the only solution I thought of was be to include an additional parameter (the t() function itself, used by React to provide the translation texts) in a particular function, potentially affecting other callers (I fortunately checked for that situation). I believe whoever did the original PR would have done in an much easier/better/faster way. That's my whole point with this discussion.
I like your idea of automating this process for the remaining languages, may be creating the corresponding issues (one per affected language) and the PR itself, marking missing translation keys in the particular l10n ("localization") file and removing unused keys (both done by the script I developed), so different people can work on them (in their particular languages).
For example, running the script against the es-419 locale with the current "develop" branch, my script clearly points out the need for translating 70 keys to spanish (while also removing dozens of unused keys), like the 3 shown below:
"Help": {
"Title": "Ayuda",
"KeyboardShortcuts": "Atajos",
"Reference": "Referencia",
"About": "Acerca de",
"ReportBug": "[NEEDS TRANSLATION - Nav.Help.ReportBug] Report a Bug",
"ChatOnDiscord": "[NEEDS TRANSLATION - Nav.Help.ChatOnDiscord] Chat On Discord",
"PostOnTheForum": "[NEEDS TRANSLATION - Nav.Help.PostOnTheForum] Post on the Forum"
}
And this is what a spanish speaker sees today when he/she uses the editor (in production), proving the keys are really missing in that language:
The same applies to all remaining languages (besides en-US itself and pt-BR, which I have been working on over the last few weeks).
Let me know if you have any doubts about my proposal and I will be glad to help.
Cheers.
I am also suggesting here (https://github.com/processing/p5.js-web-editor/pull/3656) that code which uses dynamic translation keys, like this one:
return (
<a
href={`#${targetId}`}
className={linkClasses}
onFocus={handleFocus}
onBlur={handleBlur}
>
{t(`SkipLink.${text}`)}
</a>
);
would not be accepted anymore. Fortunately the above code is the only one I found in the codebase.
My proposal is to always use a TypeScript union type to define all potencial/valid values and a TS "exhaustiveness checking" to constrain and assure each value has a corresponding translation entry. Something like:
// May be in some other module.
export type SkipLinkOptions = 'PlaySketch'; // More could be added later, e.g. 'PlaySketch' | 'StopSketch'
interface SkipLinkProps {
targetId: string;
text: SkipLinkOptions;
}
export const SkipLink = ({ targetId, text }: SkipLinkProps) => {
const [focus, setFocus] = useState(false);
const { t } = useTranslation();
const handleFocus = () => {
setFocus(true);
};
const handleBlur = () => {
setFocus(false);
};
const linkClasses = classNames('skip_link', { focus });
let translatedText: string;
switch (text) {
case 'PlaySketch':
translatedText = t('SkipLink.PlaySketch');
break;
default: {
const exhaustiveCheck: never = text;
throw new Error(`Unhandled case: ${exhaustiveCheck}`);
}
}
return (
<a
href={`#${targetId}`}
className={linkClasses}
onFocus={handleFocus}
onBlur={handleBlur}
>
{translatedText}
</a>
);
};
Please check the comments in that particular PR for more details.
Yeah this is a good idea. It will increase the complexity for each PR but in the long run will save a lot of time trying to hunt down missing (key-less) translations. Maybe we can finally put an end to the everyday "this part is not translated" PRs.
And if we agree on the automation of issue generation then I can take a crack at it after further discussion.
Good to know we are reaching an agreement. You have used the right verb: HUNT DOWN. Sometimes this can be a real nightmare for those who have not participated in the feature creation.
I have no idea on how to automate the issue or even the PR creation (already linked to the issue). There may be APIs for both that could be used in a script.
When you say that "It will increase the complexity for each PR", I only partially agree, in the sense that it is much easier for those creating the feature to reason about their own (english) texts and place appropriate keys in the en-US l10n file. May be this will take additional 10-20 minutes from their time, which is insignificant when compared to the feature itself. Do you agree?
I believe we are already answering the "Questions for Discussion" raised by @raclim.
Count on me to stablish this new process, once all involved parts have been heard.
Take this PR (https://github.com/processing/p5.js-web-editor/pull/3644) as an example, where 2 German-related entries have been fixed. What about the other 76 (yes, seventy six) keys which are still missing, like shown in the attached file (identified by my script)?
Who on Earth would hunt them down manually??? Think about it for a minute.
Someone who is fluent in German could easily translate them in a matter of minutes, specially when using AI inside VS Code, Cursor etc. AI would suggest the proper translations and the person would only review (and fix) them.
We could go a litlle further and periodically (e.g. once per week) generate an issue in an automated way containing a summary of all missing keys, one line per language and properly sorted. Something like:
| Language | Missing keys |
|---|---|
| us-EN | 0 |
| pt-BR | 3 |
| es-419 | 70 |
| de | 76 |
| ... | ... |
It took me less than 2 minutes to do it using VS Code and Copilot:
Of course this would have to be reviewed by someone fluent in that language (not my case).
Question: is this discussion valuable? Or I am going into too many details?
Your suggestion gave me another idea. We could use LLM to auto-translate whenever there is a new/untranslated keys and then if speakers of that language find a mistranslation, they can report that. We mostly have single words or small phrases in the UI and LLM can do that pretty consistently so I think the "mistranslation issues" will be far less than the "missing translation" issues. That way we won't have to wait for someone fluent in German to show up and do the translation. Until we get someone fluent in German, we could have AI generated translations which will be better placeholder than no translation at all.
Question: is this discussion valuable? Or I am going into too many details?
I think what you are suggesting is valuable but we need to go step by step.
Your suggestion gave me another idea. We could use LLM to auto-translate whenever there is a new/untranslated keys and then if speakers of that language find a mistranslation, they can report that. We mostly have single words or small phrases in the UI and LLM can do that pretty consistently so I think the "mistranslation issues" will be far less than the "missing translation" issues. That way we won't have to wait for someone fluent in German to show up and do the translation. Until we get someone fluent in German, we could have AI generated translations which will be better placeholder than no translation at all.
Yeap, that would be a perfect fit. It would work like magic and all of the sudden the editor would be fully translated in virtually any language, without any human intervention. I think it would take a little effort, but would be technically possible with the current state of LLMs.
Honestly we could compile it all and make it into a GSOC project this year or something. But that's way outside the scope of this conversation lol.
Please tell me more on how GSoC actually works. I live in Brazil, but could travel for a week or two in order to physically participate in such a program, if needed.
GSOC is a google sponsored open source fellowship. Organisations participate and submit projects. Individuals (usually students) then apply to work on them projects. Google then selects fellows that work on the project and pay them some stipend. It's 3 months long so you need a big project for it. It's a work from home type deal.
But that's the choice of people working in Processing. I digress.
I think all valuable things have been said regarding the issue. Now we wait for @raclim 's comments.
Being frank, the suggested integration with LLMs wouldn't be that hard to accomplish. I could try to write a POC about it. It would certainly take me much less than 3 months. May be (just may be) 1 or 2 days, if I dig into it.
Thanks for sharing your thoughts and inputs on this!
First of all, what I have in mind is a (much) simpler process: the person who makes a PR (be it a bug fix or new feature) would be primarily responsible for identifying and creating new translation/localization keys in the en-US locale only (and of course provide the proposed english texts for them), which would always be the primary source of truth regarding i18n. PRs which do not follow this basic principle/rule simply would not be accepted.
Agree with this, this is the principle we're aiming to follow at the moment! This could probably be emphasize further somewhere in the PR template or documentation.
I like your idea of automating this process for the remaining languages, may be creating the corresponding issues (one per affected language) and the PR itself, marking missing translation keys in the particular l10n ("localization") file and removing unused keys (both done by the script I developed), so different people can work on them (in their particular languages).
I think this makes sense, having a flow where issues are generated for the remaining languages would be really useful. I think there’s a tradeoff between creating one issue per language (better visibility and parallel work) versus one consolidated issue (better for issue consolidation/organization). I'd be down to look into an approach for this—it could start with a manual process and then moving towards automation. An issue template for translation updates might also be needed here as well.
Your suggestion gave me another idea. We could use LLM to auto-translate whenever there is a new/untranslated keys and then if speakers of that language find a mistranslation, they can report that. We mostly have single words or small phrases in the UI and LLM can do that pretty consistently so I think the "mistranslation issues" will be far less than the "missing translation" issues. That way we won't have to wait for someone fluent in German to show up and do the translation. Until we get someone fluent in German, we could have AI generated translations which will be better placeholder than no translation at all.
I think this is a good suggestion to address gaps in translations, though noted in our Translation Documentation, we currently aim to have translations that aren’t based on machine-generated content, and we encourage fluent speakers to contribute directly. I also feel there’s a potential value in leaving untranslated keys visible, where it can serve as an entry point for new contributors. Another approach might be to add a clear call to action in the editor itself, guiding users (especially those new to contributing) toward submitting translations.
Regarding Google Summer of Code, it’s still a bit far off and our participation in this program isn’t guaranteed—I’d suggest keeping this idea independent of it for now!
May be we could auto-generate the translations, but flagged (literally) as such with an appropriate icon (e.g. the country flag 🇧🇷, 🇩🇪, 🇪🇸 etc), which would implictly be our "call for action" for a revision by native speakers. Does it make sense to you?
Examples:
Or even:
(which is my preferred option)
IMHO, both are far better than the default (current) alternative, in use today:
It would certainly reach a much broader audience, going beyond those who can read English. This is for example what an Italian-speaking user sees in the About Page in the current/production version of the editor:
Not very helpful...
And the resulting page:
How do you feel about it now? Think of those who DO NOT read English.
We could also add a tooltip like "This is an automated translation, done by IA. If you see any relevant errors or would like to suggest an improvement please open a Merge Request and help us make this great product even better". Of course shown in the user's own language.
PS: even though I have an Italian citizenship (besides my Brazilian one) I do not speak the language and have used an LLM to help me.
Once fully automated, supporting a completely new language would be possible with the push of a button.
That would depend on what the core reason is behind the no machine generated translation rule.
That would depend on what the core reason is behind the no machine generated translation rule.
I see two distinct goals for the current rule regarding the use of automatic translations:
- Be an incentive for newcomers to contribute to the editor
- Avoid language translation errors
The proposed process here, IMHO, would not affect none of them, since:
- newcomers would still have a chance to correct any eventual mistranslations (all properly flagged with some sort of icon, like I suggested), making relevant suggestions; and
- the current state of LLMs does offer meaningful and valid translations in almost 100% of the cases.
We could even go ahead and provide a quick link so users could open a pop-up window and report mistranslations easily, by filling a simple form.
I quickly developed a POC, proving the idea really works. This is the result of running the script in my terminal:
✗ npm run translate Portuguese Brazil ./translations.json
> [email protected] translate
> npm run build && node translator.js Portuguese Brazil ./translations.json
> [email protected] build
> tsc
Translating './translations.json' to Portuguese...
Sending translation request to Gemini API...
{
prompt: 'Translate the following localization entries to `Portuguese` spoken in `Brazil`.\n' +
'Only translate entries which start with the tag "[NEEDS TRANSLATION - <key>]",\n' +
'focusing in the text after the tag and ignoring the tag itself. Do not include the tag in\n' +
'the translation. Never translate text inside double curly braces (e.g., {{example}}).\n' +
"Add the `Brazil` country's flag emoji at the end of each translated entry,\n" +
'preceded by a space. If the country does not have a specific flag, use the generic globe emoji 🌐.\n' +
'If the entry does not have the tag, do not translate it and leave it unchanged.\n' +
'Maintain the same JSON structure and formatting.\n' +
'Do not add or remove any entries, only translate the ones with the tag.\n' +
'Respond only with the translated JSON content, without any additional text or explanation.\n' +
'Make sure to keep the JSON valid.\n' +
'Note: `Portuguese` refers to the language name in English, e.g., "Spanish", "French",\n' +
'"German", "Chinese", etc.\n' +
'Here are the localization entries to translate:\n' +
'---- JSON START ----\n' +
'{\n' +
'"Cookies": {\n' +
'"Header": "Cookies",\n' +
'"Body": "O editor p5.js usa cookies. Alguns são essenciais para a funcionalidade do site e permitem que você gerencie uma conta e preferências. Outros não são essenciais - são usados para análises e nos permitem aprender mais sobre nossa comunidade. <strong> Nunca vendemos esses dados ou os usamos para publicidade. </strong> Você pode decidir quais cookies gostaria de permitir e saber mais em nossa <0>Política de Privacidade</0>.",\n' +
'"AllowAll": "[NEEDS TRANSLATION - Cookies.AllowAll] Allow all",\n' +
'"AllowEssential": "[NEEDS TRANSLATION - Cookies.AllowEssential] Allow essential"\n' +
'},\n' +
'"Legal": {\n' +
'"PrivacyPolicy": "Política de Privacidade",\n' +
'"TermsOfUse": "Termos de Uso",\n' +
'"CodeOfConduct": "Código de Conduta"\n' +
'},\n' +
'"SkipLink": {\n' +
'"PlaySketch": "[NEEDS TRANSLATION - SkipLink.PlaySketch] Skip to Play Sketch"\n' +
'},\n' +
'"CopyableInput": {\n' +
'"OpenViewTabARIA": "[NEEDS TRANSLATION - CopyableInput.OpenViewTabARIA] Open {{label}} view in new tab"\n' +
'},\n' +
'"Visibility": {\n' +
`"Changed": "[NEEDS TRANSLATION - Visibility.Changed] '{{projectName}}' is now {{newVisibility}}..."\n` +
'}\n' +
'}\n' +
'---- JSON END ----'
}
Translation Results:
===================
```json
{
"Cookies": {
"Header": "Cookies",
"Body": "O editor p5.js usa cookies. Alguns são essenciais para a funcionalidade do site e permitem que você gerencie uma conta e preferências. Outros não são essenciais - são usados para análises e nos permitem aprender mais sobre nossa comunidade. <strong> Nunca vendemos esses dados ou os usamos para publicidade. </strong> Você pode decidir quais cookies gostaria de permitir e saber mais em nossa <0>Política de Privacidade</0>.",
"AllowAll": "Permitir todos 🇧🇷",
"AllowEssential": "Permitir essenciais 🇧🇷"
},
"Legal": {
"PrivacyPolicy": "Política de Privacidade",
"TermsOfUse": "Termos de Uso",
"CodeOfConduct": "Código de Conduta"
},
"SkipLink": {
"PlaySketch": "Ir para Executar Sketch 🇧🇷"
},
"CopyableInput": {
"OpenViewTabARIA": "Abrir a visualização de {{label}} em uma nova aba 🇧🇷"
},
"Visibility": {
"Changed": "'{{projectName}}' agora está {{newVisibility}}... 🇧🇷"
}
}
Proving that it is absolutely feasible to automate the whole translation process. It wouldn't take me more than a few hours to fully automate the inclusion of a completely new language.
You will see below some of the localization files completely translated using this process:
- Japanese: translations.json
- Italian: translations.json
- French (Canada): translations.json
- Spanish (Latin America): translations.json
- German: translations.json
In average approximately 70 new entries were translated per locale.
Finally, follows examples of the About page in the various languages.
Japanese:
Italian:
French (Canada):
Spanish (Latin America):
German:
I am sure you got the idea. I am very excited about the opportunities that may arise from this and would like to share my thoughts with you, specially regarding the possibility of including new languages with minimal/no effort, allowing this great product to be used by a much broader audience worldwide.
Take Snap! (https://snap.berkeley.edu/snap/snap.html) from University of Berkeley (where I am also a contributor) as an example of another global product, currently supporting almost 50 languages:
How long would it take for p5.js to support such a large number of languages using the current, manual process?
I just created a Greek version of the editor, in a matter of minutes. Take a look at some screenshots:
How cool is that?
And a Russian version:
We could even go ahead and provide a quick link so users could open a pop-up window and report mistranslations easily, by filling a simple form.
Doing a follow-up on this idea, I have detected and generated navigable links for all AI-generated translations, which carry all necessary information like:
- Current language;
- Clicked localization key; and
- Translated text
so we could open a form requesting the user to provide a translation suggestion. For example, considering the lower left link below:
I have used a very basic window.prompt()above, just as an example. We could display instead a much better UI, using a modal and a larger textarea, and save the responses in our backend in a (new) MongoDB collection, so they can be properly curated later by an admin.
This way non-technical users could contribute to translations in a much easier way when compared to today.
And the translation keys in this case would become something like:
"About": {
"LinkDescriptions": {
"Libraries": "[NEEDS REVIEW About.LinkDescriptions.Libraries] Επεκτείνετε τις δυνατότητες του p5.js με βιβλιοθήκες που δημιουργήθηκαν από την κοινότητα."
}
}
IMHO, it's an idea worth considering.
@raclim, please provide us with your opinions and valuable suggestions, so we can move further.
@raclim please assign me
Doing a follow-up on this idea, I have detected and generated navigable links for all AI-generated translations, which carry all necessary information like:
1. Current language; 2. Clicked localization key; and 3. Translated text
Doing such transformations in React isn't trivial and, after 2 weeks and a lot of trial and error, I have achieved my intended initial goal by creating a "post processor" function in React's i18n lib, with some (necessary) changes to the application code itself.
I am very confortable with my current solution and ready to share it with all interested parties.
I haven't seen any activity on this topic over the last 2 weeks. Is there still any interest in this subject?
Another good addition to the editor would be a "new language support" form, where users could suggest new language/country combinations, such as Greek/Greece, Russian/Russia, French/France, Spanish/Spain, Portuguese/Portugal, English/India etc.