elasticsearch-entity-resolution
elasticsearch-entity-resolution copied to clipboard
Generate query from XML configuration file
Suppose we've generated, through genetic Duke algorithm or by hand, an xml configuration file (maybe also duke's output xml), which contains, for each field of a given entity, thresholds (high, low), comparator algorithm, cleaners, etc. Provide a functionality to make elasticsearch-entity-resolution "ingest" xml configuration. So that anyone can generate a query from xml file.
This is a sample configuration:
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<duke>
<schema>
<threshold>0.40440177844276326</threshold>
<property type="id">
<name>Id</name>
</property>
<property>
<name>Name</name>
<comparator>no.priv.garshol.duke.comparators.ExactComparator</comparator>
<low>0.41346477540386545</low>
<high>0.5427865167850188</high>
<cleaner>
<name>"no.priv.garshol.duke.cleaners.TrimCleaner"</name>
</cleaner>
<cleaner>
<name>"no.priv.garshol.duke.cleaners.LowerCaseNormalizeCleaner"</name>
</cleaner>
</property>
</schema>
</duke>
It's the output of the Duke's genetic algorithm with the add of <cleaners>
in configuration.
Thanks. Hopefully would be able to commit something next week...