WebToEpub icon indicating copy to clipboard operation
WebToEpub copied to clipboard

Add XHTML sanitizer

Open dteviot opened this issue 6 years ago • 1 comments

So can safely do preview for default parser, need to sanitize HTML. Also, may be able to automatically fix some of the "unable to convert XXX to XHTML" errors. Needs to

  • [ ] Remove all attributes that are not in a white list.
  • [ ] Change tags that are not known types (to <div>?)
  • [x] Invalid chars in text nodes. e.g. https://royalroadl.com/fiction/5288/how-to-avoid-death-on-a-daily-basis/chapter/69854/73-night-of-the-living-zombers has a text node holding a "Form Feed" (value 0x0c) character. (Not to be confused with line feed.)

Initial attribute white list

  • For all elements
    • id
    • class
    • style
  • hyperlinks
    • href
  • image/svg

Note, this issue is related to Refer https://github.com/dteviot/WebToEpub/issues/118

dteviot avatar Oct 02 '18 19:10 dteviot

Nice, they're adding an API for this as well.

https://wicg.github.io/sanitizer-api/

Synteresis avatar Dec 01 '21 06:12 Synteresis