CsQuery icon indicating copy to clipboard operation
CsQuery copied to clipboard

Store parser position information in dom nodes

Open ejsmith opened this issue 9 years ago • 1 comments

I have an application which requires knowledge of where nodes start and end relative to the source HTML content. I know that adding this information would be a bit of a memory hit to the DOM structure, but it could also be pretty valuable as well. Any chance you would consider adding this information?

ejsmith avatar Aug 06 '14 14:08 ejsmith

There was another request for a similar feature, and I shot it down on the basis of resource use. Adding a reference to each node is 8 bytes per which can make a big difference on large structures (or more commonly high-volume situations as are not uncommon in web scraping applications). So I've tried to keep the footprint of the node as minimalist as possible.

However, there is no reason that you couldn't create a structure that inherits from the core CsQuery structures DomObject & DomElement. I haven't actually looked at the DOM code in a long time so it's possible it could be difficult to do this, but the HTML parser itself is completely DOM agnostic and it would be fairly straightforward to implement a tree builder using any types of nodes you like. If they inherit from the core CsQuery structures then it should work just fine in CsQuery as well. The major caveat here is that you can't simply create something that implements the interfaces; you will actually need to inherit from DomObject & DomElement since there is tight coupling to these classes in the code. But everything inherits from DomObject; if you use these as base classes and the appropriate interfaces for other types of DOM nodes like text it should work fine.

jamietre avatar Aug 18 '14 13:08 jamietre