turndown icon indicating copy to clipboard operation
turndown copied to clipboard

Provide text escaping and replacement hooks for context-dependent escaping

Open martincizek opened this issue 5 years ago • 2 comments

Provide text escaping and replacement hooks to allow implementing comprehensive, context-dependent escaping. See domchristie/turndown#324 as an example, where it is necessary to avoid data corruption.

This PR supersedes domchristie/turndown#304.

martincizek avatar Jul 06 '20 12:07 martincizek

This looks interesting. I like the idea of text nodes being replaceable (although should they pre or post escaping?) Anyway, I'm a bit confused as to why we need both an escapes option and a textReplacement option. What would the code look like to customise escaping with these new options?

Hi @domchristie, this is related to our e-mail conversation and I should perhaps have it posted here:

One more thing to add about escaping, realized it when walking through the PRs and the README... I believe we should make TurndownService#escape deprecated in favour of #339.

#339 is now backward-compatible for people who overrode #escape, as the default textReplacement just calls it. But no other internals are exposed as the main class methods and textReplacement fits well among the other *replacements in advanced options. I believe it is also a good move towards introducing some pluggable escaping system, regardless of its implementation. It will even allow us to move the escaping system to another project without a fixed dependency.

I will update this PR as follows:

  • the code currently using the escapes setting will be considered the default textReplacement implementation.
  • TurndownService#escape() will be preserved as an extension point just for backward compatibility. It will be marked deprecated and the README will be updated accordingly.
  • the meaning of the escapes option will be better documented. I.e. if the users set it, it will be used by the default textReplacement implementation. If the users provide custom textReplacement implementation, it's up to them how they use the escapes option.

Now it is probably more clear what I meant by eventual externalising of the escaping builder. The builder (or its outcome) would plug into the textReplacement hook and the future escape rules would plug into the builder, should the user use the builder. :)

I consider this the best ratio between backward-compatibility and flexibility in subsequent design decisions. Does this make sense?

martincizek avatar Dec 04 '20 09:12 martincizek

P.S. Updated the comment to express the intention rather than the code dependencies (the backward compatibility would not work if it were implemented exactly how it was written before :)).

And regarding:

although should they pre or post escaping?

Actually the escaping is the standard thing to do within text node processing. In general, escaping details also depends on the node (i.e. context) and options indeed.

martincizek avatar Dec 04 '20 09:12 martincizek