GQ
GQ copied to clipboard
Examples
Need examples for this library usage in c++. Thanks in advance.
@kirillv There are benchmarks that illustrate usage. There's also usage in the HttpFilteringEngine library. What more are you looking for?
@TechnikEmpire It woult be great if there would be separate examples in c++ how to find, extract, modify html content (with your selectors). I found that it is possible to serialize to html again in c++ code (gumbo doesnt have such features), but there is no separate examples on this. Thank in advance!
@kirillv True, although the serialization is adapted from an official sample in the Gumbo repo. It's actually licensed under another license (that single file), which is the Apache2 I believe, but the original author Kevin Hendrix gave me permission to take it under the MIT (he gave that permission in a bug thread I opened on the Gumbo repo).
Anyway you're right, because the serialization is actually the place where you perform mutations. You use the selectors to grab things and then initiate the serialization. During that serialization, your selected nodes will be given back to you through a simple interface where you can either:
- Modify their values.
- Return nothing, effectively deleting the node and all of its children.
- Inject completely different, hand-written HTML instead.
Anyway I will get to this and StahpIt/HttpFilteringEngine eventually, I'm just swamped with private work right now.
Update
One more thing. This mutation API is rather limited in the sense that it's meant for one-off transformations of parsed HTML. It's not fully dynamic, where you can keep applying sequential mutations. In order to do this, you'd need to do it in passes, where you serialize in a pass, create a new document from that serialized string, rinse and repeat.
The reason for this is that there's some really heavy duty hashmaps and such being constructed when you parse a document, and this only happens once. It's slightly expensive, and static (once compiled for a document, it doesn't get recompiled). The purpose of this is because it speeds up selection dramatically. All tag names, tag property keys and values are indexed through unordered_map
and map
, and also in scoped manner, so that complex selectors are blazing fast (this is where all the speed comes from). The only downside is that it's rigid, only done once per parsed document. Mutations cannot currently be reflected in this tree.