turndown
turndown copied to clipboard
Provide text escaping and replacement hooks for context-dependent escaping
Provide text escaping and replacement hooks to allow implementing comprehensive, context-dependent escaping. See domchristie/turndown#324 as an example, where it is necessary to avoid data corruption.
This PR supersedes domchristie/turndown#304.
This looks interesting. I like the idea of text nodes being replaceable (although should they pre or post escaping?) Anyway, I'm a bit confused as to why we need both an
escapesoption and atextReplacementoption. What would the code look like to customise escaping with these new options?
Hi @domchristie, this is related to our e-mail conversation and I should perhaps have it posted here:
One more thing to add about escaping, realized it when walking through the PRs and the README... I believe we should make TurndownService#escape deprecated in favour of #339.
#339 is now backward-compatible for people who overrode #escape, as the default textReplacement just calls it. But no other internals are exposed as the main class methods and textReplacement fits well among the other *replacements in advanced options. I believe it is also a good move towards introducing some pluggable escaping system, regardless of its implementation. It will even allow us to move the escaping system to another project without a fixed dependency.
I will update this PR as follows:
- the code currently using the
escapessetting will be considered the defaulttextReplacementimplementation. TurndownService#escape()will be preserved as an extension point just for backward compatibility. It will be marked deprecated and the README will be updated accordingly.- the meaning of the
escapesoption will be better documented. I.e. if the users set it, it will be used by the defaulttextReplacementimplementation. If the users provide customtextReplacementimplementation, it's up to them how they use theescapesoption.
Now it is probably more clear what I meant by eventual externalising of the escaping builder. The builder (or its outcome) would plug into the textReplacement hook and the future escape rules would plug into the builder, should the user use the builder. :)
I consider this the best ratio between backward-compatibility and flexibility in subsequent design decisions. Does this make sense?
P.S. Updated the comment to express the intention rather than the code dependencies (the backward compatibility would not work if it were implemented exactly how it was written before :)).
And regarding:
although should they pre or post escaping?
Actually the escaping is the standard thing to do within text node processing. In general, escaping details also depends on the node (i.e. context) and options indeed.