pyquery icon indicating copy to clipboard operation
pyquery copied to clipboard

Document need for XML parser argument when querying case-sensitive XML tags

Open jlamberg opened this issue 11 years ago • 2 comments

I struggled a while trying to query an XML document for specific elements with mixed-case tag names, and only later found that what was mentioned in issue #16 was relevant to my case, too.

Could it be documented (somewhere where it is easily seen) that you need to give a "parser='xml'" argument to the PyQuery when trying to match uppercase or mixed-case tag names? I can do a pull request of this for the Tips page, for example, if you'd like that.

Also, the Tips page states: "By default pyquery uses the lxml xml parser". If lxml is the default parser, which XML parser is then used when I explicitly add the above parser argument? In other words, why do I need to add the argument in the first place in order to find mixed-case tag names if parsing is done usign an XML parser, by default?

jlamberg avatar Sep 02 '14 20:09 jlamberg

I wrote a Gist as a reminder to myself and to show others what I was trying to do and what I found: https://gist.github.com/jlamberg/0debbc45c9c6178f8c9d.

jlamberg avatar Sep 02 '14 20:09 jlamberg

As I remember, it's more complex than that. It also depend of the cssselect settings: https://github.com/SimonSapin/cssselect/blob/master/cssselect/xpath.py#L594

You can override this feature by using a custom translator PyQuery(css_translator=your_class)

I don't think that this is documented as well...

Btw, pull request are always welcome. Especially for a documentation effort

gawel avatar Sep 03 '14 16:09 gawel