Markus Mandalka

Results 23 issues of Markus Mandalka

Since Tika Python seems to have such new settings, disable this log instead of delete it after tika-python call

enhancement

"indexing new file /media/folder/...." is misleading, if adding to queue where indexing is done later parallel by daemon.

enhancement

So error messages from plugins are readable direct in the Queue status UI, not only in index.

enhancement

Additional to new importer and UI for import / enrichment with Hypothesis annotations from an user or group by etl_hypothesis.py, add an enhance_hypothesis.py plugin for ETL process for single documents.

Iterate/Batch in etl_enrich by Solr cursor so iteration possible without tagging entries as done and base for regulary/often export of many documents i.e. for machine learning.

Enrich entities in index with context like description or wikipedia page and score the posted (con)text by matching such additional informations

Documentation for import/index named entities from RDF ontologies and SKOS thesauri and lists of names.

Score locations by distance to other locations in same document, which mostly are the near to the other.

Score by named entity recognition class in compare with RDF class of the named entity.

Score by similarity of name by levensthein distance.