xml-rs
xml-rs copied to clipboard
Doctypes are ignored
Currently the library appears to completely ignore doctypes, skipping over them. Is this an intended limitation of xml-rs, or something that should be added?
No, this is not intentional. I wrote in the readme that I want to support DTDs one day.
DTDs have quite complex syntax, though, that's why I didn't implement them from the beginning.
DTDs are also a notorious source of security exploits, so parsing them should probably be controlled by both a Cargo flag and a runtime check, both of which are disabled by default.
Is it the parsing that is dangerous, or validating the document? Because the former would surprise me a lot, especially in a language such as Rust that does bounds-checking by default.
@Thiez Parsing of the DTD is not dangerous, but using the DTD in any form exposes one to vulnerabilities, since resolving entities the DTD defines can require exponential time and/or loading of arbitrary files or URLs.
I'm assuming you refer to the billion laughs attack? Perhaps the XmlReader could simply not expand entities, or have some kind of configurable expansion limit (e.g. if expanding entities would increase the size of the document by more than a factor of n compared to the original input, we error out). Recognizing an exploding entity shouldn't be hard.
@Thiez Also XML External Entity (XXE) Processing.
I am adding support for parsing the doctype, as for librsvg I need support for entities.
I'm familiar with how libxml2 prevents entity expansion attacks, and I'll do something similar for xml-rs.
There are some related things:
- Parse the doctype
- Notify the caller about the external subset so it may optionally provide the reader with the DTD
- Parse the internal subset so we read entity declarations
- Entity expansion in the appropriate places, with guards for expansion attacks
- Validate the document from the DTD
So far I'm in the "Parse the doctype" stage. You can see my progress here: https://github.com/federicomenaquintero/xml-rs
@Thiez what is it that you need about parsing the doctype? Validation, entity expansion, something else?
I don't recall anymore :smile: I think I was trying to output xml for creating xhtml for an epub? Something like that. I've since hacked the whole thing together in c#, so I guess this particular issue isn't blocking me anymore. I suppose it would be best to keep this issue open anyway, so this can get fixed?
There are two parts that are dangerous:
- External entities. I know of literally no use whatsoever for them other than security exploits.
- Large entities. This can be used for DoS
On Dec 5, 2016 4:49 PM, "Thiez" [email protected] wrote:
Is it the parsing that is dangerous, or validating the document? Because the former would surprise me a lot, especially in a language such as Rust that does bounds-checking by default.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/netvl/xml-rs/issues/133#issuecomment-264988291, or mute the thread https://github.com/notifications/unsubscribe-auth/AGGWB_C3RXOJRz036-B5Xt3B7V67c-onks5rFIbugaJpZM4LA0A8 .
Parsing of internal DTD subset is implemented now in v0.8.9, including predefined entities and protection against the billion laughs attack.