pulldown-cmark-to-cmark icon indicating copy to clipboard operation
pulldown-cmark-to-cmark copied to clipboard

Simplify backslash escapes where possible

Open mgeisler opened this issue 2 years ago • 2 comments

Hi @Byron!

An issue was opened in the mdbook-i18n-helpers crate about how we treat backslashes in the translations: https://github.com/google/mdbook-i18n-helpers/issues/105.

As you might recall, the tooling there works by

  • parsing Markdown text -> Markdown AST
  • find translatable text in the AST
  • turn the AST nodes into Markdown text (using this crate)
  • translate this text and turn it into Markdown AST again

The third step here turns a Markdown file with

\x

into

\\x

This is completely valid! According to Backslash escapes, \\x and \x both mean backslash-x (2 bytes).

However, I can see how it could be confusing to people and tools which rely on a lot of backslashes, e.g., for LaTeX math like $\sqrt{\frac{1}{x}}$. Here, the translator will end up seeing the escaped backslashes: $\\sqrt{\\frac{1}{x}}$ because that is what we get back when we serialize the Markdown AST into Markdown text. It would be easier to work with the unescaped backslashes in this case.

So I'm proposint that pulldown-cmark-to-cmark would emit the simplest escaped form for an escaped character.

mgeisler avatar Oct 25 '23 20:10 mgeisler

Thanks for sharing and for explaining the situation so well.

There is already a way to pass option to the conversion engine and I'd wonder if a flag can be added to configure how backslashes are escaped?

I am pretty sure that there is edge-cases where one is more desirable than the other, and until this can be determined it would be good to make it configurable. Your help with this would definitely be appreciated.

Byron avatar Oct 26 '23 05:10 Byron

Thanks for sharing and for explaining the situation so well.

You're very welcome! Since last night, I've gotten more information: https://github.com/lzanini/mdbook-katex/issues/100#issuecomment-1780611579. I now realize that the mdbook-katex preprocessor works on the raw Markdown! It doesn't even depend on pulldown-cmark :slightly_smiling_face:

This makes my task in mdbook-i18n-helpers significantly more complex since I very much do use the Markdown AST.

Perhaps an answer would be to run mdbook-katex before the translation happens. I'll have to think a bit about this, but it would remove the need for me to worry about how backslashes are encoded by your crate.

Your help with this would definitely be appreciated.

Thanks! I'll try to see if I can find someone interested in this!

mgeisler avatar Oct 26 '23 08:10 mgeisler