hop
hop copied to clipboard
[Feature Request]: Apache HOP Plugin for Taxonomy-Based Text Extraction and Labelling
What would you like to happen?
An Apache HOP plugin that utilises a predefined taxonomy (a categorisation scheme) and automatically extracts and classifies snippets of text around key terms. The plugin would capture a specified number of words around the term, rounded to complete sentences, and label the text according to the taxonomy.
Use Case: The plugin will be useful for users who need to focus on specific sections of large documents, such as known bias terms, industry-specific terms, product names, or other key phrases. Instead of manually reviewing entire documents, the plugin automatically extracts and labels relevant text segments. This plugin will facilitate semi-supervised learning by using preliminary labelled data to guide the analysis of unlabelled data.
Issue Priority
Priority: 3
Issue Component
Component: Transforms