selectolax icon indicating copy to clipboard operation
selectolax copied to clipboard

Feature request: print html tag of the node (not including its children) for lexbor engine

Open gshashank84 opened this issue 3 years ago • 2 comments

To check if two nodes are equal, we try to compare the html tag of them. But here the tag of its children also comes in the output. Please add an method that print html string of single node only (i.e. not including its parents).

gshashank84 avatar Aug 18 '22 05:08 gshashank84

Also can we make the __eq__ method of LexborNode performant? As we are internally comparing html of the nodes for the equality operator, it takes huge computation time for comparing two nodes of a big tree DOM.

gshashank84 avatar Aug 18 '22 13:08 gshashank84

Also can we make the __eq__ method of LexborNode performant? As we are internally comparing html of the nodes for the equality operator, it takes huge computation time for comparing two nodes of a big tree DOM.

Yeah, this is an old problem, but there is no easy way to solve it, since lexbor/modest engines do not have internal IDs or something like that to simply compare them. We could compare their location in memory, but we still need to perform a big check in case of a miss. It's also possible to get a race condition when using memory address as the main way to check it.

rushter avatar Aug 19 '22 09:08 rushter