jsoup icon indicating copy to clipboard operation
jsoup copied to clipboard

Problems with Elements class array methods updating the DOM

Open 821938089 opened this issue 1 year ago • 8 comments

Since 1.17.1 the array method of the Elements class will update the DOM, this behavior is not what I expected and could potentially cause some problems (e.g. accidentally breaking the DOM).

To be honest, I don't think modifying arrays should update the DOM, this behavior seems counter-intuitive to me.

If this cannot be changed back, I suggest adding an ElementList class, which is equivalent to the previous Elements class. The inheritance could be ArrayList<Element> -> ElementList -> Elements. The two classes provide a method to convert between them.

821938089 avatar Jan 10 '24 08:01 821938089

I don't think I follow what circumstances would break the DOM? What action are you doing that you are modifying the Elements but don't want those modification to go back to the DOM?

Perhaps there could be a detach() method on Elements that decoupled.

jhy avatar Jul 01 '24 07:07 jhy

A reading app that uses a custom set of rule executors to extract book information, chapter lists, content, etc. from the website.

In the internal processing of the rule executor, Elements is used to store the acquired elements, and sometimes the array needs to be cleared and re-added.

The new version of jsoup will delete the elements in the dom at the same time after clearing the array, which will cause the dom to be destroyed.

821938089 avatar Jul 01 '24 08:07 821938089

OK, I guess I'm not clear on why you'd clear and re-add elements vs using a different variable / re-select. Hard to suggest another approach without more detail. But as far as I can guess, a detach() method would be suitable, right? Could potentially by used in combination with clone().

jhy avatar Jul 01 '24 08:07 jhy

What does it mean to use different variables? Re-select is not possible, it must use custom rule executor to select.

I wish there was a global switch that controlled this behavior so I didn't have to modify the code everywhere. For example Jsoup.enableElementsMutableDom(false)

There is another problem, some third-party libraries that rely on jsoup will also have this problem, such as: JsoupXpath, these libraries also perform similar operations as me.

821938089 avatar Jul 01 '24 08:07 821938089

Can you link me the example in JsoupXpath so I can see the context?

jhy avatar Jul 01 '24 09:07 jhy

https://github.com/zhegexiaohuozi/JsoupXpath/blob/master/src/main/java/org/seimicrawler/xpath/core/axis/DescendantSelector.java

821938089 avatar Jul 01 '24 09:07 821938089

Hi, any updates here? Is it possible to provide a global switch?

821938089 avatar Jul 11 '24 23:07 821938089

@821938089 when I have an update I will definitely post it here.

Currently I am leaning towards a method like deselect() which will remove the Element from the Elements array, but not chain that to removing it from the DOM. I generally prefer explicit code than configuration switches. It may be a little more work to update existing code, but it will be clearer for other users in the future. And, Find Usages on remove() will make it easy to inspect and update the changes required.

jhy avatar Jul 15 '24 06:07 jhy

OK done -- I added elements.deselect(object), elements.deselect(index), and elements.asList().

jhy avatar Mar 11 '25 00:03 jhy

Is it possible to provide new methods for all the old array operations? I also need clear().

821938089 avatar Mar 11 '25 00:03 821938089

OK, added elements.deselectAll()

jhy avatar Mar 11 '25 00:03 jhy