Function to extract every tag available
Hello,
I have the problem, that I don't know the exact structure of the html files. Therefore I can't provide the whole path with .back.
Let's say I have this html file
<!DOCTYPE html>
<html>
<body>
<h1>My First Heading</h1>
<div>
<p>My first paragraph.</p>
</div>
<p>Another p-Tag</p>
</body>
</html>
I want to extract every p-tag now. This is working well, as long they are all in the same tag. But when I'm using
for (auto& p : doc.node()["body"].back()["p"])
{
cout << p.front().text() << std::endl;
}
It is not parsing the p-tag in the div element. So my question is, if there is any function in this lib, that I can parse every p tag regardless of if it is in a lower hierarchy. Unfortunately there is almost not documentation, which makes it a little bit hard to use :)
Have a nice day, Noctera
Hello @noctera , I'm extremely happy when knowing that someone is trying to use my projects. 🤣
Unfortunately, there's no such feature at now. This library aimed at my personal usage at first, but even myself gave up parsing HTML with C++. Therefore, the lib is under inactive maintenance.
However, if you insist using this lib, I suggest that implement the feature yourself, as the lib provides the freedom to do that. The easiest implementation should be
template <typename F>
void for_each(html_node const& node, F&& f)
{
f(node);
if (node.type() == html_node_type::node)
{
for (auto& n : node)
{
for_each(n, std::forward<F>(f));
}
}
}
and then
for_each(doc.node()["body"].back(), [](html_node const& n) {
if (n.tag() == "p")
std::cout << n.front().text() << std::endl;
});