open-semantic-etl icon indicating copy to clipboard operation
open-semantic-etl copied to clipboard

Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & Named Entity Recognition) & data enrichment (annotation) pipelin...

Results 42 open-semantic-etl issues
Sort by recently updated
recently updated
newest added

So error messages from plugins are readable direct in the Queue status UI, not only in index.

enhancement

Spacy NER text size limit is one million chars. If longer extracted plain text for NER it should be segmented with separete Spacy NER call for each segment.

bug
enhancement

Hi, I'm developing a data enhancer plugin , as describer at but the following error is thrown while indexing files Exception while data enrichment of arquivos_indexados/a.xml with plugin teste: process()...

I figured if I reran `opensemanticsearch-index-dir /path/to/dir` it would reindex the directory. It does get the new files that were added but it doesn't seem to remove files from the...

Add additional / different links / Relationship Types for different NER models, so you can see/filter NER by entity extraction from Thesaurus/Ontologies and ML like Stanford NER or SpaCy and/or...

enhancement

Add ID/URI of named entities extracted by SKOS thesaurus or ontology by Open Semantic Entity Search API, so they can be linked / enriched to/with other linked data imports in...

enhancement

Hi, Is there an easy way to remove an uploaded ontology? Right now, I am following these steps: 1. Go to `/etc/opensemanticsearch/facets` and remove the `config['facets']` line corresponding to the...

Add OCRd text from full PDF to PDF page segments

enhancement

Enrich by structured data embedded in JSON-LD / import linked data in JSON-LD format to Knowledge Graph