xml-rs icon indicating copy to clipboard operation
xml-rs copied to clipboard

Doctypes are ignored

Open Thiez opened this issue 9 years ago • 11 comments

Currently the library appears to completely ignore doctypes, skipping over them. Is this an intended limitation of xml-rs, or something that should be added?

Thiez avatar Nov 30 '16 23:11 Thiez

No, this is not intentional. I wrote in the readme that I want to support DTDs one day.

netvl avatar Dec 01 '16 19:12 netvl

DTDs have quite complex syntax, though, that's why I didn't implement them from the beginning.

netvl avatar Dec 01 '16 19:12 netvl

DTDs are also a notorious source of security exploits, so parsing them should probably be controlled by both a Cargo flag and a runtime check, both of which are disabled by default.

DemiMarie avatar Dec 05 '16 21:12 DemiMarie

Is it the parsing that is dangerous, or validating the document? Because the former would surprise me a lot, especially in a language such as Rust that does bounds-checking by default.

Thiez avatar Dec 05 '16 21:12 Thiez

@Thiez Parsing of the DTD is not dangerous, but using the DTD in any form exposes one to vulnerabilities, since resolving entities the DTD defines can require exponential time and/or loading of arbitrary files or URLs.

DemiMarie avatar Dec 06 '16 00:12 DemiMarie

I'm assuming you refer to the billion laughs attack? Perhaps the XmlReader could simply not expand entities, or have some kind of configurable expansion limit (e.g. if expanding entities would increase the size of the document by more than a factor of n compared to the original input, we error out). Recognizing an exploding entity shouldn't be hard.

Thiez avatar Dec 06 '16 09:12 Thiez

I am adding support for parsing the doctype, as for librsvg I need support for entities.

I'm familiar with how libxml2 prevents entity expansion attacks, and I'll do something similar for xml-rs.

There are some related things:

  • Parse the doctype
  • Notify the caller about the external subset so it may optionally provide the reader with the DTD
  • Parse the internal subset so we read entity declarations
  • Entity expansion in the appropriate places, with guards for expansion attacks
  • Validate the document from the DTD

So far I'm in the "Parse the doctype" stage. You can see my progress here: https://github.com/federicomenaquintero/xml-rs

federicomenaquintero avatar Aug 16 '17 18:08 federicomenaquintero

@Thiez what is it that you need about parsing the doctype? Validation, entity expansion, something else?

federicomenaquintero avatar Aug 16 '17 18:08 federicomenaquintero

I don't recall anymore :smile: I think I was trying to output xml for creating xhtml for an epub? Something like that. I've since hacked the whole thing together in c#, so I guess this particular issue isn't blocking me anymore. I suppose it would be best to keep this issue open anyway, so this can get fixed?

Thiez avatar Aug 17 '17 17:08 Thiez

There are two parts that are dangerous:

  • External entities. I know of literally no use whatsoever for them other than security exploits.
  • Large entities. This can be used for DoS

On Dec 5, 2016 4:49 PM, "Thiez" [email protected] wrote:

Is it the parsing that is dangerous, or validating the document? Because the former would surprise me a lot, especially in a language such as Rust that does bounds-checking by default.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/netvl/xml-rs/issues/133#issuecomment-264988291, or mute the thread https://github.com/notifications/unsubscribe-auth/AGGWB_C3RXOJRz036-B5Xt3B7V67c-onks5rFIbugaJpZM4LA0A8 .

DemiMarie avatar Aug 19 '17 12:08 DemiMarie

Parsing of internal DTD subset is implemented now in v0.8.9, including predefined entities and protection against the billion laughs attack.

kornelski avatar May 09 '23 14:05 kornelski