Markus Mandalka
Markus Mandalka
Since Tika Python seems to have such new settings, disable this log instead of delete it after tika-python call
"indexing new file /media/folder/...." is misleading, if adding to queue where indexing is done later parallel by daemon.
So error messages from plugins are readable direct in the Queue status UI, not only in index.
Additional to new importer and UI for import / enrichment with Hypothesis annotations from an user or group by etl_hypothesis.py, add an enhance_hypothesis.py plugin for ETL process for single documents.
Iterate/Batch in etl_enrich by Solr cursor so iteration possible without tagging entries as done and base for regulary/often export of many documents i.e. for machine learning.
Enrich entities in index with context like description or wikipedia page and score the posted (con)text by matching such additional informations
Documentation for import/index named entities from RDF ontologies and SKOS thesauri and lists of names.
Score locations by distance to other locations in same document, which mostly are the near to the other.
Score by named entity recognition class in compare with RDF class of the named entity.
Score by similarity of name by levensthein distance.