5e-database icon indicating copy to clipboard operation
5e-database copied to clipboard

Add more API languages.

Open ariedov opened this issue 6 years ago • 39 comments

Hey, love this repo!

I play DnD in Ukraine, and we use Russian as our primary language for the parties, so I would love to have this API available in different languages.

Maybe creating folders like en, ru and just storing the different json files there would be a good option?

I am ready to contribute, just don't want to deploy my own API :)

ariedov avatar Aug 24 '19 12:08 ariedov

That seems like a pretty interesting idea. I'm a little concerned about changing the folder structure (or in this case adding a folder structure) for the existing files. My other main concern is that we would need someone to maintain the different languages as well as make sure changes in files in one language end up propagating to the other languages. I, unfortunately, do not speak or write in Russian.

bagelbits avatar Sep 05 '19 21:09 bagelbits

It might be better to some how store the different languages together but it still wouldn't handle the divergence of trying to maintain more than one language.

bagelbits avatar Sep 24 '19 17:09 bagelbits

Were you thinking just the descriptions or all fields converted to Russian and, inevitably, other languages?

bagelbits avatar Nov 01 '19 19:11 bagelbits

Yeah, pretty much. And also having the ability to request a specific language in a GET param.

ariedov avatar Nov 02 '19 16:11 ariedov

Does the SRD exist in any other languages currently? There is prob a risk of copyright infringement as well, validating that translations are not pulling from proprietary (D&D) info. Just pointing that out as something to think about.

benjaminapetersen avatar Nov 05 '19 20:11 benjaminapetersen

I am not a lawyer.

@benjaminapetersen yes the SRD exists in other languages. According to the OGL, it is allowed to translate the SRD as long as the translation is also under OGL. So yes, you may translate the SRD if you want. You shouldn't be able to get sued for that.

But currently there are no official translation of the SRD. (By "official", I mean WotC-approved.) There are official translations of books, but not any of the SRD.

Therefore I don't see why this should be included and part of this project goal. Plus, if things are translated from the D&D books and not the SRD, we have no way to know that and this project could be hosting copyright-infringing material without anyone noticing.

It's already hard to keep non-SRD monsters, spells, races, and subclasses out of this repo, I don't see why we should take the burden of doing it in languages we don't understand.

ogregoire avatar Feb 28 '20 14:02 ogregoire

Hi guys!

I'd love to have this API available in spanish. I understand the problems @ogregoire is pointing out and the others about changing the project structure or mantaining the language. My proposition to do it would be:

  • Any pr adding a new language must be sent together with a link to the published SRD translation in order check that is following OGL. (ie: http://srd.nosolorol.com/DD5/index.html)

  • About the structure. All texts could be replaced with language keys (ie: "Barbarian" to "barbarian", "Skill: Animal Handling" to "skill-animal-handling") and a new .json per language created (enUS.jon, esES.json, ruRU.json...), linking each language key with the translated text:

enUS.json

{
   "barbarian: "Barbarian",
   "skill-animal-handling": "Skill: Animal Handling",
   ....
}

then, during the building process and before refreshing the database, automatically create new localized files (5e-SRD-Classes-enUS.json. 5e-SRD-Classes-esES.json...) replacing those keys with the translated text and all the urls to include the language (/api/proficiencies/skill-animal-handling to /api/en/proficiencies/skill-animal-handling or /api/es/proficiencies/skill-animal-handling), and then deploy'em to the db.

  • About the maintenance. Following my proposition there will be multiple language files with the translations. In case you add any new key to the enUS one you don't need to worry about the other languages to be updated. In the replacement process described in the previous step, this can be configured to get the key from the enUS.json file in case that key is not found on the language being processed. In this way if a translation is missing at least the original one in english will appear, waiting for the key to be added on that language.

:)

carloslancha avatar Mar 24 '20 02:03 carloslancha

This could work, indeed. But that'd require a lot of work to do the mapping which I don't know how to do. @bagelbits any insight on how to do so?

However I wouldn't go as far as to include the country so far (enUS, esES), because there are simply not enough translations yet. Also I don't like the file naming: usually, in the programming languages I know, different locales are named as _<language> or _<language>_<country>, where <language> is the 2-characters ISO 639-1 representation of a language and <country> would be the 2-characters ISO 3166-1 representation of a country.

So basically, I'd recommend using:

5e-SRD-Classes_en.json
5e-SRD-Classes_es.json

ogregoire avatar Mar 24 '20 10:03 ogregoire

I'm ok with that naming, I was just following the same pattern of the current files (using -) and adding the country just to have it prepared for the future, but you're right, there's not enough translations yet.

For the mapping I was thinking on a simple replacement script that gets the value of name keys on each .json (that's the language key), looks for the language key the language file and replace the original value with the translated one.

I can try later to write a POC for this.

The more tedious work will be replacing the current values with the keys, but I think I can automate it too to replace the original files values with the keys and generate the first language (en).

carloslancha avatar Mar 24 '20 14:03 carloslancha

How would you deal with incomplete or in progress translation?

ogregoire avatar Mar 24 '20 15:03 ogregoire

Taking those language keys not found on the incomplete or in in progress translation from the "master" language, english.

What I saw in several projects I worked in, is use that master language and add in the end of the text (Copy from English)

carloslancha avatar Mar 24 '20 16:03 carloslancha

Here you can find the POC: https://github.com/bagelbits/5e-database/pull/158

carloslancha avatar Mar 25 '20 02:03 carloslancha

I have some thoughts but I'll have to come back to this in a day or two. I haven't been exactly in the best headspace the last few days. Though I do like the direction where this is going.

bagelbits avatar Mar 25 '20 18:03 bagelbits

Sorry that took so long. I left a comment on that PR just for how we're keying everything. I think we could probably clean it up a bit based off of that but I really like this approach. It's simple and elegant, my favorite way to solve a problem. :D

bagelbits avatar Apr 03 '20 21:04 bagelbits

If we don't want values to be lists, I'd suggest keys along the lines of LANG-KEY-strength-description-1 instead? What do y'all think? The only thing that I think could go wrong with this is key collisions?

bagelbits avatar Apr 03 '20 21:04 bagelbits

@carloslancha @bagelbits @fergcb Hi, I've ask this last week about other languages colab. and found this ticket. I've made a first json version of the monsters from spanish srd. It has a sightly different schema but I think it could be useful for this ticket. You can find it on:

https://github.com/Javrd/spanish-srd5.1-crawl/releases/tag/v0.1.0

Javrd avatar Nov 08 '20 21:11 Javrd

@Javrd That's really useful! I think the first step is to pick up the work from the POC. This would break all of the english language into a separate doc that could then be hot-swapped for alternative language files. I think the current state is that the POC is sound, but naming conventions need to be updated, and I think there a bunch of merge conflicts, so the language file would probably have to start over.

bagelbits avatar Nov 08 '20 21:11 bagelbits

I'm worried about how this will actually get stored in the backend. Our json <-> mongodb pipeline would need to be altered a bit.

Do we make one database per language? Keep languages as separate collections? Do we include translations in the documents themselves?

https://stackoverflow.com/questions/23802834/multilingual-data-modeling-on-mongodb

There's a few good approaches on this SO question that are worth exploring or feeling out

Redmega avatar Jul 27 '21 15:07 Redmega

Hmmmm. I think either separate db per language or separate collections? I'm trying to think about how to support this from a GraphQL standpoint.

I guess it also begs the question on the api, how do want to distinguish which language? Would that be in the URL or as a param?

bagelbits avatar Jul 27 '21 19:07 bagelbits

I think the API side can be flexible. We can default to something like /api/:lang/[...], and have redirects in place from a middleware detecting the Accept-Language header. Idk if a query param is the right call here.

Redmega avatar Jul 28 '21 03:07 Redmega

Hey!

I just created a pull request (#445) for another approach to multilingual support. It allows us to parse the source data and separate it into what should be translatable (locale) and what shouldn't be (templates). It also allows us to build the source data back together with an altered locale file, resulting in a translated version of the database.

Would love to hear your thoughts on it!

djurnamn avatar Feb 24 '22 01:02 djurnamn

Oh dang. I completely forgot to encapsulate the alternative design we came up with in the Discord. I should do that here. I'll take a look at your PR though.

bagelbits avatar Feb 24 '22 04:02 bagelbits

I didn't know there was a Discord 😅, I'll check that out and get up to speed. Okay cool, let me know if you have any questions about it!

djurnamn avatar Feb 24 '22 08:02 djurnamn

@djurnamn Right. So. Here's my alternative suggestion:

I've been thinking about the multi-language support for the API a little bit more. And I think the design of one set of collections per language might be flawed/does not scale. On the the one hand, it means you can just copy the file of all text from one language, and translate it in line. However, I don't think the models in the API will easily support hot swapping which collection you're talking to based on the incoming language request. And I don't want to add a new set of models for each new supported language. The API should not care about new languages that get added after we start supporting them.

We can handle this one of two ways.

Option A

Convert any string or array of strings to a hash where the key is the ISO language code and the value is the string/array in that language:

{
  "description": {
    "en_us": "something",
    "pt_br": "algo",
    "ja_jp": "なにか"
  }
}

or

{
  "description": {
    "en_us": ["something"],
    "pt_br": ["algo"],
    "ja_jp": ["なにか"]
  }
}

Option B

Option B is similar to Option A, except backwards compatible. Namely we keep strings and arrays of strings the same. However, we add an additional key for each. The key would be same but we append ::localization to it. For example:

{
  "description": "something"
  "description::localization": {
    "en_us": "something",
    "pt_br": "algo",
    "ja_jp": "なにか"
  }
}

or

{
  "description": "something"
  "description::localization": {
    "en_us": ["something"],
    "pt_br": ["algo"],
    "ja_jp": ["なにか"]
  }
}

Either is a pretty massive change, but this is an exceptionally complicated feature. I'm honestly, leaning towards Option A, but I could be convinced for B.

bagelbits avatar Feb 24 '22 10:02 bagelbits

You can find the original post in Discord here.

And if you haven't joined the Discord server yet. Here's the invite.

bagelbits avatar Feb 24 '22 10:02 bagelbits

Okay, that's cool! My populate templates script could fairly easily be modified to put the data back together in either of those shapes. And I could extend the part currently reading from one locale file to iterate through a locales folder, allowing us to rebuild the source files with any set of languages we like.

Thanks for the invite! :)

djurnamn avatar Feb 24 '22 15:02 djurnamn

Excellent! Yeah might thoughts are you would basically build two scripts. One is a throwaway script that just coerces the data into this new shape. The second is a helper/tool script that will just prepare the database for a new language. Like adding in "pt_br": "", into every localization map.

bagelbits avatar Feb 24 '22 19:02 bagelbits

Yeah, that sounds good. Let me know how I can help! I think at least the logic for distinguishing between translatable and non-translatable values in my script could be useful for that.

It would be cool to have the locales separately in some standardized format (like WebExtensions json) so that they can be pulled into, and maintained in, a translation management system. And perhaps then, the second script you mention could optionally parse the locale files and add their values in the localization map.

djurnamn avatar Feb 24 '22 21:02 djurnamn

@djurnamn Sorry for taking so long to respond. However, we now have semantic versioning for the docker images that get built for the DB, so I feel way more comfortable with the breaking change this will cause.

It would be cool to have the locales separately in some standardized format (like WebExtensions json) so that they can be pulled into, and maintained in, a translation management system. And perhaps then, the second script you mention could optionally parse the locale files and add their values in the localization map.

Can you say more about this? Are you saying having the locale files being separate from the rest of the data similar to your initial proposal? That is technically doable if it gets all stitched together before getting shoved into the DB.

I think I'm still leaning towards Option A if we go that route. Thoughts?

bagelbits avatar Mar 26 '22 19:03 bagelbits

Hey @bagelbits! Oh, that's cool!

Yeah, I guess that just felt like a more manageable way to maintain the translated content. The compiled version would still be what you outlined in Option A. If the build script, that combines the translatable and non-translatable content into the preferred format, is outside of the scope for what this repo should be, I could just maintain that separately.

I'll start working on a new version of the build script that outputs the compiled data in the Option A format.

djurnamn avatar Mar 31 '22 15:03 djurnamn