parsel
parsel copied to clipboard
.remove() also removes text after the deleted element
I tried removing an element as a way to exclude some repeated text from a website. I used the following code:
import parsel
html = """
<html><body>
Text before.
<span>Text in.</span>
Text after.
</body></html>
"""
s = parsel.Selector(html)
s.css('span').remove()
print(s.get())
results in:
<html><body>
Text before.
</body></html>
I would expect only the span to be removed, and the text after it to be left as-is, but it always removes the "text after" either until another element is encountered or it hits the end of the parent of the removed one.
I can confirm this issue in Parsel 1.6.0.
Apparently this is a byproduct of how lxml stores the text - it's a part of the preceding element, so removing the element also removes the text. I tried mitigating this in PR #207