Entity bug
That's my third issue in a row, sorry if that's annoying. I understand it's not easy to make such a library, I would help if I could...
Here's a simple script this issue occurs in:
p = pq("<span><foo><bar></span>")
print(p("span").html(), p("span").text())
p = pq("<span><b><foo></b><bar></span>")
print(p("span").html(), p("span").text())
Output:
<foo><bar> <foo><bar>
<b><foo></b><bar> <foo> <bar>
while it should be
<foo><bar> <foo><bar>
<b><foo></b><bar> <foo><bar>
Basically, if there are entities on the beginning and end of the selected element, then entities are decoded, even in HTML, when they shouldn't... Possible reason? Tried to search for it, but I really don't understand the code... Thanks.
Perhaps it's here:
if not children:
return tag.text
tag.text is unencoded, while this function shouldn't return unencoded. Maybe just return it encoded?
I would fix it, but I don't see any function here for encoding/decoding entities in the project... how do I do it? I only know that there's one in html built-in library
I guess there's no such thing because lxml do that