python-xextract
python-xextract copied to clipboard
Allow Elements to be passed to parse_*()
Addresses #10
@Mimino666 There's no documentation here yet (I wasn't going to add it until you're happy with what I've done)
Handling parse()
was an unexpected quirk: if we only have an Element then it doesn't look like we can know whether a document was parsed as HTML or XML so we don't know whether to use an XML or a HTML extractor.
We can guess based on the presence (or not) of a namespace on the Element, but you can still parse XML snippets without a namespace so that could still lead to unexpected results. It also has the side effect of casting the Element back to a string as part of the XML header snooping which is what we were trying to avoid in the first place (although a check for this could be added).
I've opted to force the caller to be explicit: if you want to pass an Element to parse()
then you must use parse_html()
or parse_xml()
instead.