WebToEpub
WebToEpub copied to clipboard
Add XHTML sanitizer
So can safely do preview for default parser, need to sanitize HTML. Also, may be able to automatically fix some of the "unable to convert XXX to XHTML" errors. Needs to
- [ ] Remove all attributes that are not in a white list.
- [ ] Change tags that are not known types (to <div>?)
- [x] Invalid chars in text nodes. e.g. https://royalroadl.com/fiction/5288/how-to-avoid-death-on-a-daily-basis/chapter/69854/73-night-of-the-living-zombers has a text node holding a "Form Feed" (value 0x0c) character. (Not to be confused with line feed.)
Initial attribute white list
- For all elements
- id
- class
- style
- hyperlinks
- href
- image/svg
Note, this issue is related to Refer https://github.com/dteviot/WebToEpub/issues/118
Nice, they're adding an API for this as well.
https://wicg.github.io/sanitizer-api/