js-lingui improvement: Context

Hello 👋

@JSteunou and I needed the context to work for our company so we tried to fix the different issues we encountered with the current implementation.

We tried to follow the same "schema" for the internal structure and limit the changes.
It works (closes #1198) but it has some flaws.

For an example of the structure, for this file, it gives the following on extraction :

{
  "Component A": {
    "extractedComments": [],
    "origin": [["collect/componentA/componentA.js", 1]]
  },
  "Hello World": {
    "extractedComments": ["Comment A", "Comment A again"],
    "origin": [
      ["collect/componentA/componentA.js", 2],
      ["collect/componentA/componentA.js", 3]
    ]
  },
  "world ctxt": {
    "Hello World": {
      "extractedComments": [],
      "origin": [["collect/componentA/componentA.js", 4]]
    }
  }
}

With this, we no longer have a flat structure and context add another level.
We now have at the same level a key that can represent a message id or a context.
At various location, I added a test to know which one I'm manipulating.
If a user needs to translate the words origin or translation with a context, it breaks.
This structure also add lots of complexity.

Before we fix the remaining issues, what do you think ?
Should we change something ?

Some solutions we thought about but wanted your impressions first :

change the key to add the context like described here
add some default context in the structure :

{
  "_": {
    "Component A": {
      "extractedComments": [],
      "origin": [["collect/componentA/componentA.js", 1]]
    },
    "Hello World": {
      "extractedComments": ["Comment A", "Comment A again"],
      "origin": [
        ["collect/componentA/componentA.js", 2],
        ["collect/componentA/componentA.js", 3]
      ]
    }
  },
  "world ctxt": {
    "Hello World": {
      "extractedComments": [],
      "origin": [["collect/componentA/componentA.js", 4]]
    }
  }
}

For a complete example, I updated one of the repository example (the ^3.14.1 version references a local version I built).

Sep 14 '22 10:09 nmorel

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Updated
js-lingui	✅ Ready (Inspect)	Visit Preview	Sep 14, 2022 at 4:39PM (UTC)

Sep 14 '22 10:09 vercel[bot]

As @nmorel said, it fixes the primary issue without changing the logic & structure already in place.

I think it would be better to focus on this here, and if some changes should be made, tailor this in another PR.

Sep 14 '22 12:09 JSteunou

One thing I forgot to add, I'm not used to ramda 😅
So there is a few places where I used vanilla JS instead.

Sep 14 '22 15:09 nmorel

Will review this afternoon guys, thanks for the hard work!

Sep 20 '22 13:09 semoal

@semoal is something in that PR what stops it from being merged?

Dec 12 '22 17:12 timofei-iatsenko

@thekip could you please review this PR? 🙂

@nmorel It would be great if you could rebase the branch as it currently has conflicts that need to be resolved, sorry for the inconvenience

Jan 06 '23 13:01 andrii-bodnar

@andrii-bodnar I prefer to get feedback on the PR first before putting any more work on current branch.
The fix isn't perfect and I still have some unanswered questions.

Jan 06 '23 14:01 nmorel

@andrii-bodnar @nmorel I reviewed the code in general it looks good to me.

Here are my thoughts (doesn't mean you have to do this, just want to rise an opinion):

It's probably would be better to add in the docs more real-life examples. Right now, docs for context would be good only for people how already know what context is and would look for that in the docs.

Example from other i18n project docs which I've found helpful:

The same text elements with different meanings are extracted with different IDs. For example, if the word "right" uses the following two definitions in two different locations, the word is translated differently and merged back into the application as different translation entries.

correct as in "you are right" direction as in "turn right" If the same text elements meet the following conditions, the text elements are extracted only once and use the same ID.

Relaying on the field origin and translation feels a bit dirty and fragile. I like option with default context in internal structures. So lookup would always be as catalog[context || DEFAULT_CONTEXT][messageId]. This also would be nicer in the code as you will not have to copying these checks everywhere (message.hasOwnProperty('origin') || message.hasOwnProperty('translation') || message.context)

But actually i want to rise a bigger discussion here, read at the end of the comment.

How context would be serialized in formats other than .po? I didn't check the code in action, unfortunately. Would it bring breaking changes in already existing catalogs?

Discussion:

Implementation offered in this PR is ok just as quick workaround for such essential feature as context. And this is fine. However, I think better implementation is needed here, but it would require bigger changes in sourcecode and how the lib would be used.

Part of this was already discussed here, I want to summarize.

Current lingui implementation has a flaw: you have your original message in the bundle at least 2 times + 1 translation. For a line "Hello world" you would have one in the source code as id in i18n call. Another one as key in the message catalog and then a translation itself. Strings could be very long, not just a couple lines, so this may bring more kb to the bundle.

This PR would also add even more kilobytes because for each msg which should have a context you would have context explicitly written in the bundle + in the msg catalog.

A much better option would be generating a "stable" ID based on the msg + context as hash with fixed length. So internal structure would look like

{
 [<hash>]: {..}
}

And lookups in runtime would be much simpler.

Hash would be calculated at build time by macros + compile time for catalogs. So macros instead of

const message = t({
   context: 'My context',
   message: `Hello`
})

// ↓ ↓ ↓ ↓ ↓ ↓

import { i18n } from "@lingui/core"
const message = i18n._(/*i18n*/{
   context: 'My context',
   id: `Hello`
})

Would generate:

import { i18n } from "@lingui/core"
const message = i18n._(/*i18n*/{
   id: "<hash(message + context)>",
   message: `Hello`, // only for development purpose, would be dropped in production
})

Compile catalog command would also calculate hashes from msg + context on the fly. So context would affect only how msg id generates and will not have it's place in the bundle.

However, there are amounts of users who use a lib without a macros. I don't know who they are and what were the reasons behind this decision. I think if you don't use a macro, you lose everything what making lingui awesome. Without a macro there are better alternatives with more stars and bigger community. So I think we should concentrate on one thing and do it well instead of supporting multiple different usages.

This optional status of the macro and necessity of implementing all existing features in 2 modes adds additional complexity to the project and prevents the possibility of implementing some really cool features because we always should think about "runtime" users.

Anyway, it would be great to know how many of users don't use a macro and how painful would be dropping support of the non-macro usage.

@andrii-bodnar what do you think? We can discuss dropping non-macro usage separately somewhere. Depending on this, we can later decide a what to do with this PR.

PS this proposal with build-time id generation still can be used without dropping non-macro users, but it would require 2 runtimes, 2 sets of tests and confusing setup options, that's why I want to avoid it. And avoid generating hashes at runtime as well.

Jan 08 '23 18:01 timofei-iatsenko

I read test snapshots carefully one more time and the following came up. If we bring "a default" context as was proposed, this brings "nested" structures to these catalog formats:

Lingui (Raw) and Simple JSON catalog formats https://lingui.js.org/ref/catalog-formats.html#json.

If we do that, we also need to provide an upgrade path for the users. This should be relatively simple. However, if nested structure would be consumed by some translation tools, such as crowdin, it's probably would be treated as one entry, so it may break a flow for some.

So I would say context feature not really compatible with supported JSON formats.

Also generating a "hashed" id as proposed by me would definitely break simple JSON format flow, because key would no longer be an original string.

That's another questions to discuss.

Jan 08 '23 18:01 timofei-iatsenko

Very well summarized on the structure. This PR was careful to not bring any breaking changes and only use what was previously attempted (but not finished) on context support. That's why there is no default context that would change the all structure to add one level in depth for everyone, not only context users.

I totally agree that a better way to handle the internal structure (and json format) would be with key as hash. So instead of having 2 breaking changes in a row, I think it would be better to skip the default context improvement and immediately jump on all flat with key as hash.

Jan 09 '23 08:01 JSteunou

For the default context, it's possible I think to introduce it without breaking change.
It will only be present in the internal code. When we write to a file (json/js/po..), the default context can be removed.

Jan 09 '23 13:01 nmorel

@nmorel could you elaborate on this more?

Jan 09 '23 15:01 timofei-iatsenko

If I take examples from here, we will have this with the current PR :

po

#: src/App.js:3
#  Comment for translators
msgid "messageId"
msgstr "Translated Message"

#: src/App.js:4
#  Comment for translators
msgctxt "messageCtxt"
msgid "messageId"
msgstr "Translated Message with ctxt"

json

{
  "messageId": "Translated Message",
  "messageCtxt": {
    "messageId": "Translated Message with ctxt"
  }
}

raw

{
  "messageId": {
    "translation": "Translated message",
    "defaults": "Default message",
    "description": "Comment for translators",
    "origin": [["src/App.js", 3]]
  },
  "messageCtxt": {
    "messageId": {
      "translation": "Translated Message with ctxt",
      "defaults": "Default message",
      "description": "Comment for translators",
      "origin": [["src/App.js", 4]]
    },
  }
}

We can keep this format and only introduce the default context internally with a structure like this :

{
  "<user context or default context>": {
    "<message id>": {
      "translation": "<message translation>",
      "defaults": "<default message>",
      "description": "<comments>",
      "origin": [<origin1>, <origin2>]
    }
  }
}

When parsing po files, if there is no context, we set the default context to the message.
When parsing json files, if the value is a string, we set the default context to the message.
Raw files is the hardest to parse. We could test origin/translation fields like the PR to assign the default context or introduce a special property to context to recognize them easily :

{
  "messageCtxt": {
    "__isContext__": true,
    "messageId": {
      "translation": "Translated Message with ctxt",
      "defaults": "Default message",
      "description": "Comment for translators",
      "origin": [["src/App.js", 4]]
    },
  }
}

When converting the internal structure to po/json/raw files, it won't be too hard to remove the default context.

With this, we keep the format intact (except for message with context) and in lingui code, the structure is the same.
We no longer need to test if the object we manipulate is a message or a context containing messages.

But if you want to introduce the hash solution soon, is it really necessary ? 🙂

Jan 09 '23 16:01 nmorel

@nmorel @thekip thanks a lot for your involvement here!

I'm still learning the Lingui and it's a bit hard to make decisions about the concerns above. @Martin005 maybe you will have some thoughts?

Anyway, we'd like to introduce this feature in the nearest release. It's important to keep it backward compatible and not to provide any breaking changes for now.

Could you please guys briefly summarize what is currently blocking this PR?

And a small general question - what's the difference between comment and context here? For me, it has quite a similar purpose - providing some additional information for translation about the string

Jan 10 '23 10:01 andrii-bodnar

@andrii-bodnar the problem here if we release current implementation we would be forced to maintain it as it is right now. But the current approach has many flaws and will limit us in the implementing of new features.

So instead of making a release right now and returning to this question again and again in future we propose to implement it properly from the begining and probably ship it as version 4.

And a small general question - what's the difference between comment and context here? For me, it has quite a similar purpose - providing some additional information for translation about the string

This is a common for i18n process to have both. They might be called differently, f ex. meaning / description or context / comment as in Lingui. The purpose of context is provide few different translation for one source message:

correct as in "you are right" direction as in "turn right"

Unlike context, comment don't affect message id generation, and used just as an additional information for the translator. For example context could be "direction" where comment could be "used on the header in metablock, please keep it 20 symbols length max"

Jan 10 '23 13:01 timofei-iatsenko

Just to throw another stone here: it seems better to do things right, with a v4, if the v4 is really coming soon. But if it is coming soon, then there is no worries about supporting the v3, so the argument about being forced to maintain it turn void. So ship it quick and iterate would be better and non blocking.

On the contrary if the v4 is not coming soon, then it is also better to ship it now in v3 (maybe with the changes @nmorel explain above).

TLDR; in both cases I think it is better for users to ship it now.

Feb 06 '23 11:02 JSteunou

Suppressed by https://github.com/lingui/js-lingui/pull/1440

Feb 15 '23 22:02 timofei-iatsenko