newspaper4k
newspaper4k copied to clipboard
extract tags from breadcrumb
Issue by Ennoriel
Tue Mar 5 20:25:27 2019
Originally opened as https://github.com/codelucas/newspaper/issues/685
Hello there, I suggest that when no tags has been found on the page, it tries to find a breadcrumb and extract the elements for tags. An example:
<ul class="breadcrumb">
<li class="breadcrumb__parent">
<a class="logo__societe logo__societe--article" href="/societe/">Société</a>
</li>
<li class="breadcrumb__child">
<a class="logo__prisons logo__prisons--article" href="/prisons/">Prisons</a>
</li>
</ul>
It could be:
if ul (or other?) with "breadcrumb" class or id:
last html children containing a tag, with href attribute and text
If I have some times, I'll have a try to implement it.