message-format-wg
message-format-wg copied to clipboard
Support messages in HTML
This thread is a spin-off of the conversation that began in requirements gathering (issue #3) about how to support content that comes from and/or is destined to HTML / Web.
See previous comments:
- First comment from @zbraniecki - embedding messages in HTML
- Response to first comment from @nbouvrette - TMSes supporting HTML using placeholders
- Response to first comment from @mihnita - most CAT tools support HTML now
- Response to @mihnita from @zbraniecki - legacy l10n system DTD didn't support HTML
Thanks for creating this thread - I took the opportunity to also create one for inflections which was the other topic which seemed to dilute a bit the requirements discussions.
@mihnita
In this day and age most CAT (Computer Added Translation) tools support HTML out of the box.
Both CAT (Computer-assisted translation) tools and TMS (Translation Management System) support well HTML hence why I was proposing to leave this out of scope in terms of defining the syntax. I was mentioning this in issue #2 that I think it would help to have a clear definition of all the acronyms because it can become easy to get lost, but on top of that, having a clear view of:
- Typical CAT/TMS interactions happening today in most businesses (expected file input/output, typical usage)
- Potential shift into the online CAT/TMS where the new CAT tools are included inside the TMS (SaaS)
Is this something we should do before resuming this discussion or is this clear for everyone? Because from what I know, I still don't understand why any linguistic syntax should also try to solve markup problems given that most of those are already solved by linguistic tools (CAT/TMS).
From what I've seen the newer (online) TMSes support XLIFF, often better than the more established ones. (I call them CAT tools if they include more than TM (Translation Memory).
Related to definitions (maybe we should spin a new thread?), the way I see this:
- Translation Management System (TMS): ERP-like software that helps with the end to end localization process within businesses. This can include, project management, cost management, vendor management, resource management, reporting, translation memory management and in some cases will include CAT tools or at minimum be compatible with a CAT tool or editor. They are responsible to support various file formats as input and filter its content to extract strings that can be matched to previously translated string in order to keep cost and translation time as low a possible. They are also responsible to produce translated assets that should not tamper the original file format, other than providing translated content.
- Computer-assisted translation (CAT) tools: typically a software installed on a desktop used by linguists. Will often take XLIFF as input which will include both the strings to be translated but also a portion of the translation memory up to a certain fuzziness that is typically populated by a TMS.
@zbraniecki
The limitation wasn't CAT. It was the l10n system we used (DTD! :)).
I am not sure I am following you on this one? are you referring to the TMS or something else? Is this still a problem today?
Based on this morning's meeting, it's probably best to just make sure that the syntax doesn't conflict with HTML. I think @zbraniecki suggested that. For me, I think that SSML support is more important than HTML, but I can still see value in supporting HTML/XHTML in some way.
Closing this issue as obsolete. Open new issues for specific HTML compatibility proposals or related requests.