Lars Marius Garshol
Lars Marius Garshol
_From [[email protected]](https://code.google.com/u/106380900043315593284/) on March 16, 2013 04:40:01_ **Blocking:** -duke:26
This is becoming more urgent as new Lucene versions are released. Need to take a serious new look at this.
That's odd. I use it with the JDBC data source all the time, with no problem. Could you post your configuration, so we can see if there's something unusual there?
The trouble with implementing this one is that the config loading is so generic we really have no idea what's declared as s. It may be that reading the config...
_From [[email protected]](https://code.google.com/u/106380900043315593284/) on January 25, 2013 05:41:48_ Algorithm reference: http://www.algorithmist.com/index.php/Longest_Common_Subsequence
_From [[email protected]](https://code.google.com/u/106380900043315593284/) on October 23, 2013 10:15:45_ Longest common subSTRING has been implemented, but longest common subsequence is actually a different comparator, so that's still not done.
_From [[email protected]](https://code.google.com/u/116193730723037190676/) on September 25, 2013 05:37:20_ It would be great feature , the deduplication functionality can be integrated with Apache Solr and works as yet another REST API.
_From [[email protected]](https://code.google.com/u/106380900043315593284/) on September 25, 2013 11:22:35_ For ElasticSearch there is actually a module for this: https://github.com/YannBrrd/elasticsearch-entity-resolution It might be an idea to make something similar for Solr. Or one...
Doing this, but unfortunately it turns out that we need _many_ experiments before we can conclude anything with any certainty.
Hi there, 1) I could put up some benchmarks, but IMHO they would be useless. Accuracy varies with the data available and the amount of noise in the data. 2)...