fluent Clarity about embedded HTML and escaping

Some examples use HTML snippets in the message e.g. http://projectfluent.org/fluent/guide/text.html

description =
    Loki is a simple micro-blogging
    app written entirely in <i>HTML5</i>.
    It uses FTL to implement localization.

The question then is what happens when this is used. I would not expect fluent to not do any HTML escaping. It it therefore up to the bindings to always HTML-escape the entire returned string when it is inserted into the DOM (client-side) or into a chunk of HTML (server-side). If the message contains any interpolated user supplied input, this is vital for correctness and security (XSS etc.), but in any case we should not be expecting translators to have to know HTML syntax and manually escape ampersands etc.

However, with the above message, the HTML tags would end up as HTML5 which would be rendered as HTML5 rather than HTML5 - this is not what the example implies to me.

Looking around in this repo, it seems the current consensus is in agreement with what I've outlined above (see https://github.com/projectfluent/play/issues/2 for example), and therefore it is the examples that are misleading/confusing.

This leaves the problem of what happens when a translated string actually needs to embed HTML. This seems to be one solution: https://github.com/projectfluent/fluent/issues/16#issuecomment-351742530 . A more lightweight but less robust solution I had been thinking about was a name convention (e.g. any message id that ends -html is treated as HTML, anything else not).

It is vital for this to be really well defined (and simple to implement), otherwise you end up with XSS, or double escaping, or being unable to embed HTML in translated messages. I'm considering an implementation in Elm, and the only practical way it would work would be to compile FTL messages to Elm functions. For this to work, we'd need to know for every message what type of output (text/HTML) it was returning so that it can have the correct type signature. I'm also considering a Python implementation that would integrate into a Django project, and we'd again need to know very explicitly whether something is returning HTML or plain text.

Apr 12 '18 06:04 spookylukey

Hi! Thanks for the writeup! Before we dive in - are you familiar with DOM Overlays?

Here's documentation of the first version - https://github.com/projectfluent/fluent.js/wiki/DOM-Overlays and today we released v2 in fluent-dom 0.2.0.

DOM Overlays is how we approach DOM Fragment localization with safety and flexibility. Version 0.2 adds ability for developers to provide elements in the source HTML that get merged with translation. We'll have to document the new features in v2 :)

Apr 12 '18 06:04 zbraniecki

Thanks so much for that link, I think had seen it before but had forgotten about it. I'm still at the stage of investigating fluent and seeing whether it fits my needs. I'm currently not thinking of using fluent-dom, because I've got use cases where it won't work (e.g. plain text emails), and because in some cases I really want server-side rendering (for all the usual reasons).

I guess fluent-dom may be the way to go in some cases though, or I might need to implement similar functionality if I were to go with server-side rendering. I have questions like - what happens if I'm generating a plain text email and there is a message like Mme { $surname }. It feels like there needs to be way to communicate the context to the translator so that this kind of thing can be avoided (as per the proposal in #16).

Apr 12 '18 11:04 spookylukey

yep, semantic comments are meant to help with that!

Apr 19 '18 23:04 zbraniecki

fluent fluent copied to clipboard

Clarity about embedded HTML and escaping

fluent
fluent copied to clipboard