stumptown-content icon indicating copy to clipboard operation
stumptown-content copied to clipboard

Locales

Open peterbe opened this issue 6 years ago • 10 comments

How are we going to do the equivalent of https://github.com/mdn/stumptown-experiment/tree/master/content/html/elements/video in French?

Also, apart from the title almost all of the stuff in https://github.com/mdn/stumptown-experiment/blob/master/content/html/elements/video/meta.yaml is "locale agnostic".

We could get fancy and add https://github.com/mdn/stumptown-experiment/tree/master/content/html/elements/video/fr/ which would contain attributes/, examples/, contributors.md, prose.md. And lastly, it could contain a meta.yaml file that is everything that ../meta.yaml is but with the French "extras". E.g.

title: '<video>: L'élément vidéo intégré'
mdn-url: https://developer.mozilla.org/fr/docs/Web/HTML/Element/video
tags:
    group: Image et multimédia

peterbe avatar May 22 '19 14:05 peterbe

By the way, and pardon me for talking about something I'm new to, but how about putting the title and the tags into the prose.md. E.g.

# <video>: The Video Embed element

<!-- short-description -->
The **HTML Video element** (**`<video>`**) embeds a media player which
supports video playback into the document.

<!-- overview -->
You can use `<video>` for audio content as well, but the [`<audio>`](https://developer.mozilla.org/en-US/docs/Web/HTML/Element/audio)
element may provide a more appropriate user experience.

...

It would make browsing https://github.com/mdn/stumptown-experiment/blob/master/content/html/elements/video/prose.md in GitHub a lot more pleasant. It would also solve the problem of separating configuration from useful user-facing strings. I.e. the meta.yaml would immediately become language agnostic.

Alternatively, It could look like this (to avoid formatting entirely)

---
title: <video>: The Video Embed element
tags: 'Image and multimedia'
---

<!-- short-description -->
The **HTML Video element** (**`<video>`**) embeds a media player which
supports video playback into the document.

<!-- overview -->
You can use `<video>` for audio content as well, but the [`<audio>`](https://developer.mozilla.org/en-US/docs/Web/HTML/Element/audio)
element may provide a more appropriate user experience.

...

Browsing .md files like that, in GitHub, is pretty pleasant. For example

peterbe avatar May 22 '19 14:05 peterbe

Another approach to localization to consider: treating translation as just another Stumptown consumer (in other words, translating the JSON strings, not the source files).

There are some possible benefits to this approach. For example, translation invalidation might be a bit more robust (e.g., new comments in a source file won't invalidate its translations), we could translate content we pull in from external sources which are not themselves translated (e.g., BCD only comes in English right now), and it would simplify handling the source in this repo (so when we inevitably discover some bulk change needs to be made to all our meta.yml, we only have to figure out how to do that to one language).

ddbeck avatar May 22 '19 16:05 ddbeck

Interesting! In Kuma we have that duente (or whatever it's called) that sleuths from the .py and .js files what the defaults for gettext are. We could do that and have it ship to another git repo and get's synced with Pontoon. Who knows a lot about the feasibility and interest in this?

peterbe avatar May 22 '19 17:05 peterbe

That's a good question. I know there was talk of a discussion about localization with Sphinx at All Hands (which I won't be able to attend, unfortunately). Now that I think about it, it'd probably be wise to have that discussion before getting too involved in the particulars. :-D

ddbeck avatar May 23 '19 12:05 ddbeck

Paging @SphinxKnight , who will I think be at the All-Hands, and yes, I hope we will talk about this there. (I'm sorry again that you won't be there @ddbeck ).

I like @peterbe 's idea about making meta.yaml language agnostic and would probably support putting the title in prose.md unless I can remember why we didn't do that in the first place. For tags, though, I wonder if it might be better to treat them as not translatable. In Kuma we treat them as translatable and get in a terrible mess as a result, since we want to write code that uses tags to categorise pages (e.g. https://github.com/mdn/kumascript/blob/master/macros/APIRef.ejs#L69-L85). I suppose it depends on whether you want to expose them to humans, but I'm not sure we do (we do expose them to humans in Kuma but don't think it's a very useful feature. I'm still not sure what we are going to do with tags in stumptown, although we might use them for things like building sidebars, as Kuma does.

Another approach to localization to consider: treating translation as just another Stumptown consumer (in other words, translating the JSON strings, not the source files).

That is indeed a really interesting idea!

wbamberg avatar May 23 '19 20:05 wbamberg

Thanks for paging me in @wbamberg :) @ddbeck I'm not sure I understand what you mean by

Another approach to localization to consider: treating translation as just another Stumptown consumer (in other words, translating the JSON strings, not the source files).

What would be the content a localizer would translate in the end?

Regarding @peterbe's idea of meta.yaml as locale-agnostic, I'm 100% in favor of it.

Will, regarding tags, if I want to be pedantic, I would love Kuma to actually have a system for tag translation, I just think it doesn't have any so far (currently there is no way to tell that the "Méthode" tag is "linked" to the "Method" tag and that hurts). I second your views regarding tags: no translation needed if not directly exposed / not really useful (as far as I can tell). No need to risk another feature if it's not really helpful.

Last but not least, and I would really appreciate your views on this one: the location of the localizable Markdown files. In the current state, all of the md. files are dispersed under directories for each article/concept. If I want to answer the question "what is that needs translation", I need to jump between multiple directories. As said in a previous thread, I'd somehow like to have a folder per locale with all the prose.md files (maybe for each section). e.g. content\css\locales\en-US\ or content\css\en-US\ containing each article's prose.md and maybe an examples directory.

I think such a structure would:

  • make it easier to think about localization by default (en-US would be considered as a locale among the others) :)
  • make it easier to integrate stumptown with existing tools for project localization. Cons:
  • Definitely needs a new naming scheme so that this directory does not contain "prose1.md" ... but rather "border-block-start-width_prose.md" I think the pointer to the prose could be stored in meta.yaml. (prose: "%locale%/border-block-start-width_prose.md"}
  • Other locations contained in the meta.yaml should also be rerouted with this change

Looking forward continuing this discussion

SphinxKnight avatar May 25 '19 05:05 SphinxKnight

Thanks as always for your thoughtful comments!

I'd somehow like to have a folder per locale with all the prose.md files (maybe for each section).

I think one big challenge is that at present there isn't a clean separation between localizable and non-localizable content: it's not anywhere near as simple as meta.yaml / prose.md. For example:

  • examples are a mix between descriptive prose (currently "description.md') and source code ("example.js" etc).
  • HTML attributes are described in MD files, but these contain a mix of descriptive prose and something more like data (attribute names and permitted values).
  • even things like BCD, that are primarily data, contain localizable prose in the notes.

So at the moment, in the stumptown source, it's hard to see how you could get a single folder per locale that contains the localizable content and only the localizable content.

If we did completely separate these, then I worry that it would make things much harder for en-US authors, by putting more distance and abstractions between, say, the example sources and the descriptions of them.

@ddbeck I'm not sure I understand what you mean by

Another approach to localization to consider: treating translation as just another Stumptown consumer (in other words, translating the JSON strings, not the source files).

At the moment in Stumptown content is provided in a mixture of ways, for example:

  • YAML for metadata
  • Markdown files for prose,
  • JS, CSS, HTML files for examples,
  • JSON for BCD.

There's then a build step which converts this into a single JSON object. That JSON object is more clearly structured than the original source. For example, attribute MD files are parsed into a structure that explicitly represents things like the name, type, mandatoriness and possible values of an attribute, instead of that being implicitly represented in the MD. The JSON is used by code which builds complete MDN pages. We could think of it as the external interface of stumptown.

As I understand it (Daniel, please forgive me if I misunderstand you) the idea is for localizers to work on this JSON representation rather than the original content. Daniel lists some possible advantages: also I think:

  • the JSON might be more easily consumed by tools for translators, since it's more clearly structured
  • the JSON can distinguish much better between localizable and non-localizable content

A couple of disadvantages though:

  • this puts a wedge between en-US and other locales, so is even further from your suggestion that "en-US would be considered as a locale among the others"
  • the JSON (currently) contains HTML not Markdown. That is, the build-JSON step currently does MD->HTML conversion. One of the reasons we are trying out MD as an authoring format is that it's easy to write, so if that's true it doesn't seem helpful to make localizers work with HTML.

But I think it's worth considering.

Looking forward continuing this discussion

Yes, for sure!

wbamberg avatar May 27 '19 18:05 wbamberg

@wbamberg yeah, that correctly captures my perspective.

it doesn't seem helpful to make localizers work with HTML

Yeah, that's something I hadn't considered. It might be that there's a middle ground there though, where translation isn't quite an ordinary consumer (e.g., getting some intermediate form of the structured content with Markdown instead of HTML).

ddbeck avatar May 28 '19 10:05 ddbeck

We had many discussions during this All Hands and one with @wbamberg led to me writing this: https://gist.github.com/SphinxKnight/df892b41e93045d3dfad07335e2c3be5 which describes some hypotheses for how locales could be structured.

SphinxKnight avatar Jun 21 '19 20:06 SphinxKnight

A quick follow-up about an example which was brought by @Elchi3 (Microsoft Azure docs), MS uses a "repo per locale" strategy and relies on localization suppliers + an "automation pipeline" to identify pages that need changes on a weekly basis. Ref: https://github.com/MicrosoftDocs/azure-docs.fr-fr/issues/469

SphinxKnight avatar Jun 25 '19 15:06 SphinxKnight