monorepo introduce importer/exporter APIs to replace "storage" plugins

This issue has been raised by @martin-lysk. We agreed that importers/exporters is the right way to go but decided at the Berlin Offsite in Oct 23 to work around the issue as long as possible. First users are confused now why they storage plugins are limiting them.

Problem

Inlang's set up to be "provide your storage plugin" leads to numerous issues:

inlang's features are limited by the provided storage plugin (a no-go)
users don't understand why certain features work or don't work with a provide storage plugin (e.g. https://github.com/inlang/monorepo/issues/1577#issuecomment-1791655712)

Proposal

Storing messages should be controlled by the SDK to ensure that inlang is not limited by external plugins.

loadMessages should be succeeded by importMessages
saveMessages should be succeeded by exportMessages

message-29sn82
-> json import: login.button
-> paraglide export: login_button
-> ios export: LOGIN_BUTTON
-> android export: login-button

Pros

inlang is not limited by external plugins
import/export APIs can expose what features they support
multi-platform exports (export for iOS, Android, Paraglide) become possible
users are communicated what is supported by a target platform and what not.
we can performance optimize the storage instead of naively calling saveMessages() and loadMessages()
importer/exporter plugins could store additional data like message id ieb3s should be exported as login-button, or ieb2s exists in the namespace file en/login.json. requires https://github.com/inlang/monorepo/discussions/1418

Cons

effort.
the introduction of https://github.com/inlang/monorepo/discussions/1418 should be completed with this change too. have a project.inlang folder to avoid massive scatter across a repo

Requirements

[ ] allow multiple importers and exporters to be used in a project (load and save messages only support one "storage" plugin at a time)
[ ] must allow for "namespacing" logic see https://github.com/inlang/monorepo/issues/1577#issuecomment-1791655712. otherwise existing projects can't migrate to inlang
[ ] https://github.com/inlang/monorepo/issues/1769#issuecomment-1830270023
[ ] when is export triggered? onSave [can be ignored for now]
[ ] how to deal with creation/deletion

Nov 03 '23 20:11 samuelstroschein

@martin-lysk this seems to be a great issue for you. after all, you raised this issue and now it is hurting our growth because users don't understand why feature limitations exist for different storage formats

Nov 03 '23 20:11 samuelstroschein

As discussed in Berlin, this would be a API which finally solves limitations by external plugins and is therefore a good thing.

What (breaking) code change does this mean?

Nov 03 '23 23:11 felixhaeberle

I try to get the whole picture - collecting the inputs from the tickets referenced it seems like you have a concept on how this should be integrated already. Before I make a proposal that might not meet your thoughts - Shall we have a kickoff about that issue @samuelstroschein?

Storing messages should be controlled by the SDK to ensure that inlang is not limited by external plugins.

So inlang should come "with batteries included" and defines a default way to load messages and save messages? Shall plugins be able to override this behaviour at all?

loadMessages should be succeeded by importMessages saveMessages should be succeeded by exportMessages

So instead of loading and saving messages (persistance) plugins would import / export messages from sources like like sting.dict files or even api like poeditor/localize etc. The messages imported would than be managed by the inlang sdk and stored in the .inlang folder?

Storage Format: How would inlang store messages in the inlang folder?

Are the files supposed to be edited by users directly like it is the case at the moment? (compare https://github.com/inlang/monorepo/discussions/1464#discussioncomment-7448011)

Yes. You can use any storage format you'd like. For example, the JSON storage plugin https://inlang.com/m/ig84ng0o/plugin-inlang-json, which also reduces the clutter of the inlang message format plugin/is it easier to write translations manually. This question is important for the question about the format we use

How do we store the data:

What format

using the JSON encoded AST like in (https://inlang.com/m/reootnfj/plugin-inlang-messageFormat)
using the message format schema from mf-wg https://github.com/unicode-org/message-format-wg/tree/main/spec/data-model
other format

How do we split data

1. Store everything in one big json. All - messages with there locales/variants

Pro

straight forward - just dump the whole AST json into a file like we do in https://inlang.com/m/reootnfj/plugin-inlang-messageFormat
loading messages means just load one file - easy
...

Con

pulling a change of a single message (done by another editor or push to repo) means fetching all messages with all variants and loading the whole file for now
merge conflicts - two edits of different messages might lead to merge conflicts until lix understands the format
loading messages means just load one file - if the project contains thousands of messages with dozends of languages and variants this might become a memory issue
...

2. Store Messages split by languages / split by namespaces Pro

smaller files
if separated by namespace one could load only a subset of messages by a given namespace without loading the whole file
files could get handed over to translators by language/namespace
devs are used to this kind of separation

Con

motivated by current status quo not by the needs of a storage format
leads to manual edits of the files that might not be wanted at all
if a problem occures in one message (for example a user edited the file manually - or a tool had an error with encoding) it breaks the whole format - to reverse this user has to look into huge files

My thoughts: This is how those files are often stored in the old world to be able to deliver only messages on websites or to load messages in memory only for the current language (ios language bundles). Since inlang is usually interested in a message as a whole - all its properties (languages / variants) splitting it this way doesn't make sense for the storage format.

The use case for translators should be managed by import/export plugins instead.

3. Store each message with its locales and variants in a separate file

Pro

each message is an atomic entity - on fs level already. as long as you don't edit variants or languages of a message at the same time you don't have to deal with merge conflicts (even without lix - semantic meaning) even if we have simultanes edits a last write wins approach woulnd't hurt to much
versioning comes out of the box by git's version history
updates on a messages get propagated via the file watcher
the format could be checked against the message format schema (not really an argument - nothing prevents us to use the schema in our own schema that wraps the messages with a map)
loading a subset of keys in large project's could be done by filtering filenames
the api we design around this format is more likely to be similiar to the one we have when lix can store it as a whole
if a problem occures in one message (for example a user edited the file manually - or a tool had an error with encoding) it breaks the whole format - to reverse this user has to look into huge files
...

Cons

initial load would need to load all files in a folder - thousands of fs.read's instead of one if the project contains thousands of messages
git might become slow (compare https://www.monperrus.net/martin/one-million-files-on-git-and-github)
we add a lot of file to the inlang folder - people using inlang may not like that
...

Nov 08 '23 10:11 martin-lysk

hey @martin-lysk

The "inlang directory" change should happen before the importer/exporter stuff https://github.com/inlang/monorepo/discussions/1418.
I have yet to write a proposal for the directory stuff. Planned to do that early next week.
The directory proposal will make changes to the message format really easy so we don't have to worry about one big json or not

When would you start with this/the to be proposed directory change so that I know when to write my proposal ?

Nov 08 '23 14:11 samuelstroschein

I guess the planned iterations in https://github.com/inlang/monorepo/issues/1459#issuecomment-1801904162 will keep me busy this week - I could have a look next week tuesday.

Nov 08 '23 18:11 martin-lysk

@martin-lysk okay i intend to publish more "inlang directory" proposal monday/tuesday which I would give you to implement. afterwards, the import/export stuff can be handled

Nov 08 '23 18:11 samuelstroschein

@janfjohannes

Nov 10 '23 11:11 martin-lysk

@samuelstroschein @martin-lysk @janfjohannes What's the status here? Who is leading the importer/exporter & should be assigned?

Nov 23 '23 15:11 felixhaeberle

@felixhaeberle see samuels comment: https://github.com/inlang/monorepo/issues/1585#issuecomment-1802405547

Nov 23 '23 16:11 janfjohannes

@felixhaeberle my directory implementaiton will come first. progress can be tracked in the https://github.com/inlang/monorepo/tree/1678-project-directory branch

Nov 23 '23 16:11 samuelstroschein

Another take on the splitting proposals of martin:

While working with @NiklasBuchfink on the editor/sdk watcher we saw that not having granularity of messages is a nightmare. We watch files with thousands of messages. But we can only lint per message so as you can imagine that leads to a lot of reactive work. If we would have a solution like proposal 3 we can watch for messages and then only update and lint one message. That would be a huge advantage.

I see the problem like Samuel said, that we could have problem with a lot of files then. If we can build a granular watcher that works with one file but watches on every message could solve the problem to, by shifting complexity to the watcher.

-> Being able to watch for only one message should be a requirement for the SDK improvements.

Dec 04 '23 08:12 NilsJacobsen

We need granular reactivity per pattern that changes. Updating a single pattern should only have the linting for that particular pattern as a side effect. A reactivity matrix is needed where we can observe changes in individual patterns and apply CRUD operations. A mappable watcher that acts like a proxy over the files would be the dream. Basically, we need the same approach for file watching as SolidJS has for its reactivity system. Avoid diffing by creating a pub/sub pattern for each small entity that is reactive. That's why the normal watcher is the bottleneck or we have to split everything into a thousand files, which has its own limitations as described in suggestion 3.

Dec 04 '23 12:12 NiklasBuchfink

@NiklasBuchfink i created https://github.com/inlang/monorepo/issues/1817. Let's keep this issue for importer/exporter only.

Dec 04 '23 16:12 samuelstroschein

@samuelstroschein whats your take on the scope here? shall we just add importers exporters to inlang sdk and keep the storage topic completely out of the scope of this issue? If so we would still need a plugin that provides load and save method right?

Open questions:

What will become the execution points for importers / exporters - when shall we trigger an import or an export?

Only If storage is part of the ticket (if not we can answer those later):

should the change be backward compatible / should plugins loadMessage and saveMessages marked as deprecated as part of this
since we plan to reimplement the save logic and it will most likely need a migration for existing projects i would iterate on the target persistance format first

Thoughts on import/export triggers:

Compared to the current setup that only accepts one load and one save - importers and exporters will coexists. Projects may have an ios exporter and an android exporter and a json exporter all configured in one project. Compared to save and load, where save triggered on each change and load that is executed by the watcher and initially exports/imports are usually triggered externally and not by events coming from the sdk:

Use cases for Export Plugins

triggers when configured within a ci pipeline as part of the cli
a button within the editor to trigger an export
a button in the editor that a allows a developert to download the localization files
... any hooks in the sdk that an exporter should be triggered on?

Use cases for Import Plugins

a cli command that allows to import all keys from an existing set of messages - like ios/android/....
an external webhook (like one from lokalise https://developers.lokalise.com/docs/webhook-events#projectkeymodified that updates the keys) intagration - this would most likely only be a trigger to an import
an upload of an file like ios strings file within the editor
... do you see any triggering of an import other than external ones?

Dec 05 '23 14:12 martin-lysk

@martin-lysk do you think a sync call where we draft spec in google docs is quicker than github back and forth? If so, let's schedule a call

Dec 05 '23 14:12 samuelstroschein

Issue #1844 will be completed before this one.

Dec 08 '23 21:12 samuelstroschein

This proposal is a reaction to https://github.com/inlang/monorepo/issues/1844#issuecomment-1860894998

Proposal - introduction of aliases via amap

Introduce an alias map plugins can use to establish a relationship between message id and exported/imported key name.

Pros*

plugins can make use import/export keys
apps can display and edit "alias for i18next is login.button" because the plugin id is known

cons

?

const message = {
  id: "human_airplane_globe", 
+  alias: {
+    plugin.inlang.i18next: "login.button",
+    plugin.inlang.xml: "login-button"
+  }
}

Dec 18 '23 15:12 samuelstroschein

monorepo monorepo copied to clipboard

introduce importer/exporter APIs to replace "storage" plugins

Problem

Proposal

Requirements

How do we store the data:

What format

How do we split data

Open questions:

Only If storage is part of the ticket (if not we can answer those later):

Thoughts on import/export triggers:

Use cases for Export Plugins

Use cases for Import Plugins

Proposal - introduction of aliases via amap

monorepo
monorepo copied to clipboard