Anders Alexandersson

Results 29 comments of Anders Alexandersson

Actually, fastLink on CRAN is only version 0.4, not 4.0 :-) Therefore, I instead wish that the known issues with blocking and aggregate confusion tables will be worked on first....

Disclaimer: I am a regular fastLink user, not a fastLink developer. Yes, and thank you for mentioning the documentation issue. The commit change was on Mar 29, 2019:[https://github.com/kosukeimai/fastLink/commit/7915255f13c90292f6d4de71e3eac3bde6cb1033](https://github.com/kosukeimai/fastLink/commit/7915255f13c90292f6d4de71e3eac3bde6cb1033)

From my user perspective, I fully agree with the suggestions. Another idea: Add the argument `partial.match = c("StreetName")` if you want to prioritize `StreetName` because without the argument you only...

Recommended by Canbek et al. (2021): [https://rdcu.be/cvT7d](https://rdcu.be/cvT7d) Conclusion: > In conclusion, this study proposes a new comprehensive benchmarking method to analyze the robustness of performance metrics and ranks 15 performance...

In case it helps, I think there are possibly three different issues with the deduplication example and therefore three possible "fixes": 1) A better error message is needed. 2) A...

I routinely use `fastLink` with a similar sized linkages (that is, 1000s * 3-4 million records) on a similarly specced desktop without any issues. Possible data cleaning (attribute alignment) issue:...

Yes, in my experience, 1928 blocks would be way too many blocks for ~2000 rows. I suggested max 10 blocks. In general, more blocking will result in a faster linkage...

> I will thus try to get fewer blocks by increasing the size of each of them, No, only increase the size of the small samples. You probably instead want...

Ted's "filter" approach can be used also for larger units such as year instead of dates (if some years will only appear in one dataset and if you want fewer...