open-semantic-etl issues

How can I download modules

I want to install the opensemanticsearch-ner-python-django module in order to remove entities which have not been correctly identified. On the page https://opensemanticsearch.org/enhancer/named_entities_manager, it explains how to install it, but not...

vmsv

TSV to contenttype group spreadsheet

Contenttype text/tsv;... should not be content type group "text document" but spreadsheet.

opensemanticsearch

Law code subreferences / taxonomy

Law code subcodes in text like "a b c § 123 Abs. 3 d e f" should be extracted to multiple law codes "§ 123" and "§ 123 Abs. 3",...

opensemanticsearch

enhancement

Tika parameter for custom OCR dictionary

1

If there is a parameter in Tika for Tesseract custom OCR dictionary, add it like in OCR of PDF images.

opensemanticsearch

enhancement

Adding apache manifoldcf to etl

Hi, I need to extract document and metadata from alfresco repository. I have tried using apache manifoldcf and connected alfresco cmis to solr (opensemanticsearch). But I want to connect the...

kichenin

ETL Web: Parse last modification date from webserver

5

After upgrade to Python 3 with urllib problem with parsing last modification date from webserver like Wed, 21 Jun 2017 11:35:20 +0000 The now used dateutil parser seems not to...

opensemanticsearch

bug

help wanted

Twitter import: Add linked websites to indexing queue only, if yet not in index

1

Add linked websites to indexing queue only, if yet not in index to spare resources because indexed only once even in linked in many tweets.

opensemanticsearch

enhancement

Twitter import: Date filter

Add options to import tweets by date (since / to)

opensemanticsearch

enhancement

Change label "indexing new file" to "adding to document processing queue"

"indexing new file /media/folder/...." is misleading, if adding to queue where indexing is done later parallel by daemon.

Mandalka

enhancement

Enhanced error handling for plugins

5

Implement enhanced error handling (fallback plugins and retry) for data enrichment or data analysis plugins: There should be parameters for each extraction & analysis plugin in the process chain for...

opensemanticsearch

enhancement

open-semantic-etl
open-semantic-etl copied to clipboard

Metadata

How can I download modules

TSV to contenttype group spreadsheet

Law code subreferences / taxonomy

Tika parameter for custom OCR dictionary

Adding apache manifoldcf to etl

ETL Web: Parse last modification date from webserver

Twitter import: Add linked websites to indexing queue only, if yet not in index

Twitter import: Date filter

Change label "indexing new file" to "adding to document processing queue"

Enhanced error handling for plugins

← Metadata

Owner

Metadata

open-semantic-etl open-semantic-etl copied to clipboard

Metadata

← Metadata

Owner

Metadata

open-semantic-etl
open-semantic-etl copied to clipboard