identity-idp
identity-idp copied to clipboard
Flattens .yml files for i18n
What's in the Branch
Based on the challenges of merging multi-line, nested YML files for translation, this PR proposes flattening them to files with flat keys, no indentation needed.
scripts/yml_to_flat_ymltakes a batch of yml files and converts them to txtI18nFlatYmlBackendimplements loading these files for the I18n gem
Next Steps
If we wanted to land this, we'd need to:
- [x] update our
normalize-yamlJS script to have an option to parse/fix these files as well - [x] update our use of
i18n-tasks(need to make a compatible parser for it, it doesn't reuse i18n ones 🙄 )
Example
Example of how to generate one of these .txt files:
find config/locales -type f -name '*es.yml' | xargs ./scripts/yml_to_txt > config/locales/es.txt
Format specification
Uses JSON to encode one-line forms of strings. If we need newlines inside strings, we'll use \n
string.key.with.parts: "JSON string version of value"
To support arrays, I went with "if all keys of a hash are numeric, it should be an array" (we have one case of a hash with mixed numbers and strings)
Alternative to handle arrays
Another approach could be a special symbol for arrays like:
a.b.c.#0: "first item"
a.b.c.#1: "second item"
I expect the identity-rails-i18n-webpack-plugin JavaScript package will need some updates here as well, since it's currently implemented to read the YAML files.
It seems like we don't have that many strings that include newlines. I wonder if the "values" here could just be raw text to avoid having to deal with JSON encoding and make them easier to edit by hand. This would encourage splitting large blocks up into multiple strings and handling formatting concerns in the views instead.
Alternately, raw text with \n supported for a newline would probably be fine, since it's unlikely the actual text would ever contain that? You could even allow a literal \n via \\n or something. What I'm saying is we probably don't have to be perfect here and that maybe the ergonomics of a simpler editing format outweigh the need to allow strings that include the literal text "\n".
It seems like we don't have that many strings that include newlines. I wonder if the "values" here could just be raw text to avoid having to deal with JSON encoding and make them easier to edit by hand. This would encourage splitting large blocks up into multiple strings and handling formatting concerns in the views instead.
Alternately, raw text with
\nsupported for a newline would probably be fine, since it's unlikely the actual text would ever contain that? You could even allow a literal\nvia\\nor something. What I'm saying is we probably don't have to be perfect here and that maybe the ergonomics of a simpler editing format outweigh the need to allow strings that include the literal text"\n".
After some messing around, it turns out that this format is valid YAML, so I switched it around and tools like prettier can work with it. I pulled the telephony files back out into their own separate YMLs since those do have explicit newlines and are a bit easier
Wanted to post to add, I have been merging main into this branch a few times, and scripted the work needed to "true-up" with main:
./scripts/yml_fix_merge_conflicts --force
make normalize_yaml
The scripts merges new keys into the combined/flattened yml files, so it accepts whatever updated keys are on main
i18n_spec.rb includes a flatten_hash helper which I assume will be unnecessary after these changes. Could save for a follow-on pull request to reduce the scope here.