MacPassHTTP
MacPassHTTP copied to clipboard
Update url matching to use levenshtein distance
This update uses the levenshtein distance algorithm to determine the best matched entry, similar to how the KeePassHTTP plugin performs. This alleviates issues such as "www.facebook.com" not matching an entry whose URL is "facebook.com".
Note that this might produce false positives, particularly if passed a URL that doesn't exist in any of the entries, but in my experiments it works quite well.
In KeePassHTTP the Levenshtein distance is used to order Login entries but not for actually retrieving them, or do I read the code wrong?. You're using it to actually match, which changes the behaviour drastically. I do not intend to move away from the original implementation and to be honest I did just port @jameshurst implementation without any changes to the actual logic. If I'm wrong, it'll be merged promptly ;)
I just dipped a bit deeper, the sorting is done in KeePassHTTPKit. There might be a good place to implement the levenshtein distance to align KeePassHTTPKit with KeePassHTTP.
Hey @mstarke! Thanks for your prompt response! Looking into it further, I think I got tripped up by their README, where they state:
URL matching: How does it work?
KeePassHttp can receive 2 different URLs, called URL and SubmitURL.
CompareToUrl = SubmitURL if set, URL otherwise
For every entry, the Levenshtein Distance of his Entry-URL (or Title, if Entry-URL is not set) to the CompareToURL is calculated.
Only the Entries with the minimal distance are returned.
Looking at their code It looks like they first filter the entries that match the scheme and URL, then further filter those entries down to only those which match the Levenshtein distance. Their initial filter is a bit more robust than the one in KeePassHTTPKit currently:
while (listResult.Count == listCount && (origSearchHost == searchHost || searchHost.IndexOf(".") != -1))
{
parms.SearchString = String.Format("^{0}$|/{0}/?", searchHost);
var listEntries = new PwObjectList<PwEntry>();
db.RootGroup.SearchEntries(parms, listEntries);
foreach (var le in listEntries)
{
listResult.Add(new PwEntryDatabase(le, db));
}
searchHost = searchHost.Substring(searchHost.IndexOf(".") + 1);
//searchHost contains no dot --> prevent possible infinite loop
if (searchHost == origSearchHost)
break;
}
listCount = listResult.Count;
It looks like they do searches with each split of the "." character so for http://sub.my.url.com
they do a search for sub.my.url.com
, my.url.com
, and url.com
, then whittle those down into whichever has the minimum Levenshtein distance.
That algorithm would still solve my issue where the passed in url
is www.facebook.com
, but my database entry has facebook.com
as the url.
If it makes sense to you, I'd be happy to implement the algorithm closer to what the KeePassHTTP behavior is.