fastLink icon indicating copy to clipboard operation
fastLink copied to clipboard

Measure distance to nearest group

Open shamahutoto opened this issue 3 years ago • 2 comments

Hi there,

I want to find items that aren't matched but were just under the threshold for matching with a group. Is there a way to do this?

shamahutoto avatar Oct 07 '21 00:10 shamahutoto

Disclaimer: I am a regular fastLink user, not a developer.

Please give an example to make the issue easier to understand.

For example, this copy-pasted code will to subset to threshold match 0.85 and above:

matched_dfs <- getMatches(
  dfA = dfA, dfB = dfB, 
  fl.out = matches.out, threshold.match = 0.85
)

I guess that you need to subset with blocking which is doable but more complicated. The developers are working on improving the blocking functionality.

aalexandersson avatar Oct 07 '21 00:10 aalexandersson

Hi @shamahutoto,

As @aalexandersson mentions, one idea here would be to lower the matching threshold. By default fastLink only returns pairs of records with a matching probability larger than 0.85. However, you can lower that value to e.g., 0.001 and recover pairs with a matching probability larger than that value which will be a larger group than the one produced by the default value. However, I would not recommend going too low as you will get pairs of records with a value that is basically 0 and if the datasets you are matching are large, then the fastLink objects will be incredibly large.

If anything, let us know.

All my best,

Ted

tedenamorado avatar Oct 09 '21 04:10 tedenamorado