open-semantic-etl
open-semantic-etl copied to clipboard
Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & Named Entity Recognition) & data enrichment (annotation) pipelin...
So error messages from plugins are readable direct in the Queue status UI, not only in index.
Spacy NER text size limit is one million chars. If longer extracted plain text for NER it should be segmented with separete Spacy NER call for each segment.
Hi, I'm developing a data enhancer plugin , as describer at but the following error is thrown while indexing files Exception while data enrichment of arquivos_indexados/a.xml with plugin teste: process()...
I figured if I reran `opensemanticsearch-index-dir /path/to/dir` it would reindex the directory. It does get the new files that were added but it doesn't seem to remove files from the...
Add additional / different links / Relationship Types for different NER models, so you can see/filter NER by entity extraction from Thesaurus/Ontologies and ML like Stanford NER or SpaCy and/or...
Add ID/URI of named entities extracted by SKOS thesaurus or ontology by Open Semantic Entity Search API, so they can be linked / enriched to/with other linked data imports in...
Hi, Is there an easy way to remove an uploaded ontology? Right now, I am following these steps: 1. Go to `/etc/opensemanticsearch/facets` and remove the `config['facets']` line corresponding to the...
Add OCRd text from full PDF to PDF page segments
Enrich by structured data embedded in JSON-LD / import linked data in JSON-LD format to Knowledge Graph