pyquery icon indicating copy to clipboard operation
pyquery copied to clipboard

html parsing

Open KeenCN opened this issue 6 years ago • 3 comments

Hi, when I try to parse a html string, Tested in python command line:

from pyquery import PyQuery as pq t = pq('<span class="test">&#xe034;.&#xe034;</span>') o = t("span.test").html() print(o) [ . ]

How do I get the original string?

KeenCN avatar Dec 06 '18 06:12 KeenCN

That's ok, but it's not what I want

from pyquery import PyQuery as pq s = '<span class="test">&#xe034;.&#xe034;</span>' s = s.replace("&", "&amp;") t = pq(s) o = t("span.test").html() print(o) [ &#xe034;.&#xe034; ]

KeenCN avatar Dec 06 '18 09:12 KeenCN

I have the same problem with you: https://github.com/gawel/pyquery/issues/218 If it is &lt; that < , the problem would be more serious.

CodingMoeButa avatar Jun 06 '21 05:06 CodingMoeButa

"&#xe034" looks like a kind of icon font which means it has nothing to do with this lib. There must be a font file(like .woff file) to tell the browser how &#xe034 rendered. Without the corresponding font file or wrong font file, "&#xe034" will looks weird or wrong. This is commonly used in website to protect secret data(like price) from crawlers which called font encryption.

liquancss avatar Sep 29 '21 15:09 liquancss