parsel icon indicating copy to clipboard operation
parsel copied to clipboard

.remove() also removes text after the deleted element

Open Scarfmonster opened this issue 4 years ago • 2 comments

I tried removing an element as a way to exclude some repeated text from a website. I used the following code:

import parsel

html = """
<html><body>
Text before.
<span>Text in.</span>
Text after.
</body></html>
"""

s = parsel.Selector(html)
s.css('span').remove()

print(s.get())

results in:

<html><body>
Text before.
</body></html>

I would expect only the span to be removed, and the text after it to be left as-is, but it always removes the "text after" either until another element is encountered or it hits the end of the parent of the removed one.

Scarfmonster avatar Dec 30 '20 15:12 Scarfmonster

I can confirm this issue in Parsel 1.6.0.

Gallaecio avatar Dec 30 '20 19:12 Gallaecio

Apparently this is a byproduct of how lxml stores the text - it's a part of the preceding element, so removing the element also removes the text. I tried mitigating this in PR #207

Scarfmonster avatar Jan 04 '21 21:01 Scarfmonster