turndown icon indicating copy to clipboard operation
turndown copied to clipboard

Keeping/removing metadata content elements (e.g. script, style, title)

Open domchristie opened this issue 6 years ago • 2 comments

<script>, <style>, and <title> elements are not visible on a rendered web page, however Turndown will output their contents, e.g.

turndownService.turndown('<script>alert("Hello world")</script>') // alert("Hello world")

Perhaps these could be removed by default? The behaviour could be overridden with turndownService.keep (to render the elements wrapped in their tag) or by adding a rule. Or perhaps we should keep the default behaviour and add options to keep/remove e.g. keepScript, removeScript, keepStyle, removeStyle, keepTitle, removeTitle?

domchristie avatar Dec 20 '17 00:12 domchristie

FWIW removing metadata content elements can be done with:

turndownService
  .remove(['script', 'style', 'title'])
  .turndown(…)

and keeping them (tags included):

turndownService
  .keep(['script', 'style', 'title'])
  .turndown(…)

domchristie avatar Dec 22 '17 20:12 domchristie

@domchristie Did you find additional tags worth removing?

astoilkov avatar Jun 02 '21 09:06 astoilkov