hop icon indicating copy to clipboard operation
hop copied to clipboard

[Feature Request]: Apache HOP Plugin for Taxonomy-Based Text Extraction and Labelling

Open ep9io opened this issue 4 months ago • 0 comments

What would you like to happen?

An Apache HOP plugin that utilises a predefined taxonomy (a categorisation scheme) and automatically extracts and classifies snippets of text around key terms. The plugin would capture a specified number of words around the term, rounded to complete sentences, and label the text according to the taxonomy.

Use Case: The plugin will be useful for users who need to focus on specific sections of large documents, such as known bias terms, industry-specific terms, product names, or other key phrases. Instead of manually reviewing entire documents, the plugin automatically extracts and labels relevant text segments. This plugin will facilitate semi-supervised learning by using preliminary labelled data to guide the analysis of unlabelled data.

Issue Priority

Priority: 3

Issue Component

Component: Transforms

ep9io avatar Oct 11 '24 08:10 ep9io