nutch icon indicating copy to clipboard operation
nutch copied to clipboard

NUTCH-1749 Optionally exclude title from content field

Open steeveb972 opened this issue 7 years ago • 2 comments

Hello,

Following the description and the comment provided in this jira, I propose a patch to add the ability to exclude some tags in HTML and Tika parsers.

Regards,

Steeve

steeveb972 avatar Feb 12 '18 22:02 steeveb972

Hi @steeveb972 this seems like a really useful patch. Are you able to update it and we try to get it into master?

lewismc avatar Jan 08 '21 20:01 lewismc

Hi, it has been a while ;-) I will have a look at it this week-end

steeveb972 avatar Jan 12 '21 21:01 steeveb972