ModelicaSpecification
ModelicaSpecification copied to clipboard
MCP/0035 Multilingual support of Modelica
Created this PR to show that this topic is in development phase. Comments are welcome.
The PR was a total mess. I rebased on master, resolved the conflicts, changed the spelling back to Readme.md and force-pushed. Please reset your branch and update only file Readme.md (and not ReadMe.MD).
@beutlich Thanks!
On a very general level, I think that the presentation would benefit from breaking out details as separate documents, compare https://github.com/modelica/ModelicaSpecification/tree/MCP/0039/RationaleMCP/0039.
General direction is good, unclear points at meeting:
- How detailed context? (Class or even component-name - but what about connect, annotations etc? - just class-scope seems to reduce implementation effort).
- Should the context be unique or should tools have to check all possible ones? Just class-scope minimizes implementation effort (and effort to update it), alternative is a clear list of all variants.
- Should we revisit GNU gettext after ten years? https://en.wikipedia.org/wiki/XLIFF Check actual use, library to use them etc. Relying on QT does not seem good.
Next step: collect information about these steps, in particular last one ( @DagBruck ).
According to the last questions here our (@gkurzbach und Olaf Oelsner) ESI ITI opinion:
How detailed context?
How detailed context? (Class or even component-name - but what about connect, annotations etc? - just class-scope seems to reduce implementation effort).
As I remember it, the meaning of these notes is that one should be able to use a fully qualified path to a component as a more precise alternative to just giving the name of the class and the string to be translated. For example, consider:
model M record R Real x "Hello"; end R; R r(x "Goodbye"); end M;
It would be good to clarify which of the candidate contexts below that should be allowed:
Candidate contexts: "M.R.x", "M.R", "M" msgctxt "M.R.x" msgid "Hello" msgstr ""
Candidate contexts: "M.r.x", "M.r", "M" msgctxt "M.r.x" msgid "Goodbye" msgstr ""
We don't see here the usecase to make it that complicated. We have implemented it in some of our libraries and in all of them we read it out just with the Modelica class and not for components / modifications. "M.R" for "Hello" or "M" for "Goodbye" is for this example fully sufficient.
A declaration at a component M.r.x or M.R.x is relatively granular and might have no real use. "Hello" would be in any case "Hallo" in german and "Goodbye" would be "Auf Wiedersehen". Of course this example is relatively simple, but even words with meaning based on the context are usually used in word groups or are at least situated at the same class, which means the content can not be so different. So currently we do not have had an example where we had a problem in the translation.
One other point is that if "Goodbye" would be a string in several components of a type one is able to save a) the translation effort and b) size of the resulting translation file, when having to translate this just one time.
Should the context be unique or should tools have to check all possible ones? Just class-scope minimizes implementation effort (and effort to update it), alternative is a clear list of all variants.
Needs to be discussed: Would "M" from the upper example be sufficient to provide a translation for components in "M.R" (like "Hello"). That has also effect at the size of the translation data.
Size of the file: Having read out the translation template for the MSL 4.0 we found that the file is getting really big. Here we suggest to define the option to split the file in files for the sub libraries. Situated in the top level library. That would simplify the maintenance of the whole library. example: Modelica.Blocks.pot and Modelica.Blocks.de.po inside of Modelica/Resources/Language
Having read out the translation template for the MSL 4.0 we found that the file is getting really big. Here we suggest to define the option to split the file in files for the sub libraries. Situated in the top level library. That would simplify the maintenance of the whole library. example: Modelica.Blocks.pot and Modelica.Blocks.de.po inside of Modelica/Resources/Language
I am not in favour of this idea as it counteracts the idea of gettext to have a centralized localisation per library. File size should not be relevant as there are pretty usable tools/libraries to load/translate/lookup strings from the translation files. Modularaization shall happen on the actual library side, not the localisation side.
I updated Readme.md to have a discussion about the design choices, added Specification.txt file. We may discuss this in the next meeting.
I won't make it to the design meeting though but want to let you know that I am not in favour of the one translation file per class option for the above^^ mentioned reasons. Also, I believe that parallel translation (to the same target language) is possible by the one translation file per library option (and some strategy to avoid/handle potential merge conflicts). The advantage of the one translation file per library option is that common strings need to be translated only once. Again, modularization shall happen on the library side, not the localisation side as it counteracts the main idea of the gettext approach.
Oh, sorry - I didn't mean to directly merge these into this file, but have a review first. However, can you look through it afterwards @gkurzbach ? (And will fix problem with CI)
@HansOlsson @gkurzbach As you both contribute to this PR I'd appreciate if you could reply on my above^^ concerns. Otherwise it feels rather odd if you keep ignoring them. Thanks.
@HansOlsson @gkurzbach As you both contribute to this PR I'd appreciate if you could reply on my above^^ concerns. Otherwise it feels rather odd if you keep ignoring them. Thanks.
Sorry, I didn't understand your comment.
As I read it you were in favor of option chosen in PR - which is good, and I didn't see the need to act. So, what did I miss?
@beutlich the current version of the specificaton text follows the one translation file per library scheme, and as I understand, this is what you prefer in your comment. So we recognized it. Sorry for not answering directly.
Ah, sorry, I missed it since it was not that transparent from the commit history.
About the use case of moved classes: If a class is moved, the msgctxt changes when regenerating the template. When running msgmerge (as described in https://github.com/ESI-ITI-GmbH/TranslationTest/blob/master/README.md) existing translation is merged successfully with the updated msgctxt. (This is one of the benefits when using long-established tool chains.)
@gkurzbach @ooer Have a look at https://github.com/ESI-ITI-GmbH/TranslationTest/pull/5 where I tried to address an issue with double quotes in description strings.
@HansOlsson, wouldn't it be quite useful to have links to the main MCP PRs (like this one) from the MCP overview table at https://github.com/modelica/ModelicaSpecification/tree/master/RationaleMCP#list-of-existing-mcps? Right now for example, this is the right place for drawing attention from potential contributors of MCP reviews.
I tried building this branch with pdfLaTeX, and the console output drowning in over 300 fancyhdr warnings about problems that have nothing to do with this PR. To get rid of these warnings, one could merge master into the MCP branch.
This looks good.
As I see there are a number of design issues:
- Which translation format to use. I can see an advantage with other formats, but gettext should work.
- Per top-level package or split more. The decision to go with per top-level package makes sense to me.
- How much is context vs msgid? This uses the class-name as context, an alternative would be to have a longer context, but I think this works.
- What strings are translated? This clearly defines the strings to translate and requires that all are translated. I can see alternatives with adding more - or making some optional, but this seems good. Adding more strings would increase the cost - for something that is rarely used. Making some translations optional can be done in various ways and may create a mess; so the current format seems ok - with one possible change (will add).
Looks good to me, besides the small issue with context I mentioned above.
A number of interesting suggestions have been made above. Should we try to set up an online meeting so that they can be resolved in a way that leaves all reviewers satisfied? Then we could also discuss when and how to take the next step in the MCP process.
A number of interesting suggestions have been made above. Should we try to set up an online meeting so that they can be resolved in a way that leaves all reviewers satisfied? Then we could also discuss when and how to take the next step in the MCP process.
This is a good idea. There are some points which need a desicion.
Result of dedicated phone meeting in MAP-Lang 2022-08-30
General goals:
- Translation should be completely external
- Follow the gettext specification, but only support a subset (msgctxt, msgid, msgstr).
Specifics:
- The text to translate is done as follows: escape characters are handled, strings concatenated, and then encoded for gettext
- State gettext version in text and don't link to pdf (we might later upgrade to later version, but don't specify that in the text).
- No need to use gettext-implementation, just file.
- Do not specify that tools should be able to create the translation template (drop line 453). (If we water it down it will just become a a vague statement that it is good that tools can do it, and that doesn't gain us anything.)
- Generalize: "It contains all necessary information to translate all descriptions, but no translations."
After searching in the documentation for how context is used I found the following:
- There is no specific code for "translation not found" - the return value is the msgid in that case (without context). Note that it doesn't just return the same "string" but actually the same
const char*and test for pointer equality. - If context is specified the API always uses it. Internally this is handled by forming a new id as
msgcontext+"\004"+msgidand searching for it (at least I believe the separator is"\004"). - However, we could on top of this state that we first use context and if no translation is found look for id without context.
When doing this I'm also struck by:
- The gettext documentation doesn't look professional - context is described in https://www.gnu.org/software/gettext/manual/html_node/Contexts.html#Contexts
- I could only find the details above by looking at the source code: https://git.savannah.gnu.org/gitweb/?p=gettext.git;a=blob;f=gnulib-local/lib/gettext.h;hb=7840858d59aa33b69cd52d2b5e05e1388224a030
- The plural handling is separate (we are not planning to use it) and is designed to be used by printf-routines... https://www.gnu.org/software/gettext/manual/html_node/Plural-forms.html#Plural-forms
However, as I see it the main points with this proposal are:
- Standardize what to translate, and context
- Have some simple file format for it
If we later find that gettext is too much of a mess it seems straightforward to export/import to another format.
State gettext version in text and don't link to pdf (we might later upgrade to later version, but don't specify that in the text).
My understanding of the meeting agreement was that it should be sufficient to just have the version written in the bibliography item, meaning that a reader of the specification text doesn't immediately have to start thinking about how dated the used gettext spec is.
State gettext version in text and don't link to pdf (we might later upgrade to later version, but don't specify that in the text).
My understanding of the meeting agreement was that it should be sufficient to just have the version written in the bibliography item, meaning that a reader of the specification text doesn't immediately have to start thinking about how dated the used gettext spec is.
That works.
Based on Hans' observation above, we should poll on the following.
What to do in case there is no particular translation for a msgid in the current Modelica class context?
- A: Do nothing (use msgid as is).
- B: Try looking up msgid without context.
Note that (A) implies that it is pointless to include translations without context in the .po files, as we have defined a context for every msgid.
The conclusion for the question above at the phone meeting was that context is required, so if the msgid is not found with context it is not found (and msgid can be used).
The conclusion for the question above at the phone meeting was that context is required, so if the msgid is not found with context it is not found (and msgid can be used).
@gkurzbach @ooer Is a mandatory context still in alignment with the 2019 prototype implementation of SimulationX? Please confirm.
@gkurzbach @ooer Is a mandatory context still in alignment with the 2019 prototype implementation of SimulationX? Please confirm.
Yes it is. It could only be that our existing translation files have to be adapted.
@gkurzbach @ooer Is a mandatory context still in alignment with the 2019 prototype implementation of SimulationX? Please confirm.
Yes it is. It could only be that our existing translation files have to be adapted.
That rather sounds like a "Maybe" to me.
@gkurzbach @ooer Is a mandatory context still in alignment with the 2019 prototype implementation of SimulationX? Please confirm.
Yes it is. It could only be that our existing translation files have to be adapted.
That rather sounds like a "Maybe" to me.
Our current translation files don't have to be changed, they use for each translated string a context identifier (msgctxt ""). (P.S.: To all of you a nice weekend!)