nutch
nutch copied to clipboard
NUTCH-1749 Optionally exclude title from content field
Hello,
Following the description and the comment provided in this jira, I propose a patch to add the ability to exclude some tags in HTML and Tika parsers.
Regards,
Steeve
Hi @steeveb972 this seems like a really useful patch. Are you able to update it and we try to get it into master?
Hi, it has been a while ;-) I will have a look at it this week-end