elasticsearch-entity-resolution icon indicating copy to clipboard operation
elasticsearch-entity-resolution copied to clipboard

Generate query from XML configuration file

Open andrea-patricelli opened this issue 9 years ago • 2 comments

Suppose we've generated, through genetic Duke algorithm or by hand, an xml configuration file (maybe also duke's output xml), which contains, for each field of a given entity, thresholds (high, low), comparator algorithm, cleaners, etc. Provide a functionality to make elasticsearch-entity-resolution "ingest" xml configuration. So that anyone can generate a query from xml file.

andrea-patricelli avatar Jan 21 '16 10:01 andrea-patricelli

This is a sample configuration:

<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<duke>
  <schema>
    <threshold>0.40440177844276326</threshold>
    <property type="id">
      <name>Id</name>
    </property>
    <property>
      <name>Name</name>
      <comparator>no.priv.garshol.duke.comparators.ExactComparator</comparator>
      <low>0.41346477540386545</low>
      <high>0.5427865167850188</high>
      <cleaner>
            <name>"no.priv.garshol.duke.cleaners.TrimCleaner"</name>
      </cleaner>
     <cleaner>
            <name>"no.priv.garshol.duke.cleaners.LowerCaseNormalizeCleaner"</name>
      </cleaner>
    </property>
  </schema>
</duke>

It's the output of the Duke's genetic algorithm with the add of <cleaners> in configuration.

andrea-patricelli avatar Jan 21 '16 11:01 andrea-patricelli

Thanks. Hopefully would be able to commit something next week...

YannBrrd avatar Feb 17 '16 08:02 YannBrrd