open-semantic-etl
open-semantic-etl copied to clipboard
Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & Named Entity Recognition) & data enrichment (annotation) pipelin...
When starting the containers for the first time, docker will use the first container that starts to fill the /etc/opensemanticsearch volume. This volume should not be initialized like this and...
should be consolidated, now in multiple places like /etc/opensemanticsearch/facets and /etc/opensemanticsearch/enhancer-rdf
Unittest fails because it can not delete the indexed document after the test: ====================================================================== ERROR: test_warc (test_enhance_warc.Test_enhance_warc) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python3/dist-packages/opensemanticetl/test_enhance_warc.py", line 31, in test_warc etl_delete.delete(contained_doc_id)...
I am using the latest version of opensemantic server virtual machine 21.01.17, in which I had updated and upgraded the linux based debian environment in the virtual machine. After the...
Hello, I'm interested in using opensemanticsearch to index documents in Norwegian. I see that Norwegian is not listed in setup http://[yourserver]/search-apps/setup/ in the Document Language section. However, opensemanticsearch integrates SOLR...
I am facing issues related to newly added annotation. I followed below steps, 1) Go to annotations option 2) add new annotation of type tag 3) attach tag to the...
Hello! I use open-semantic-search-vm_20.01.17.ova and if I want to use Stanford NER I get on the right: Failed tasks while import & analysis (ETL) enhance_ner_stanford Showing a bit around, I...
It has been like this for days, the number of imported document have not changed. The system doesn't freeze, it just that it stays on 584029 documents to extract and...
I followed the readme and when I add RSS feeds then hit update I see these errors in the logs: ``` [2021-02-13 18:13:04,115: WARNING/ForkPoolWorker-10] Failed to parse HTTP header last-modified...
Since Tika Python seems to have such new settings, disable this log instead of delete it after tika-python call