Ted Enamorado
Ted Enamorado
Hi, The code has been adjusted to accommodate this knife-edge case. Please, if something else pops-out, let us know. Note that it is because of this community that we have...
Hi, To construct your blocks, you are performing exact matching on `sexe` and `date_naissance`. The problem is by loading the larger dataset, you are exhausting a lot of memory resources....
Hi, One possibility would be to remove from the larger dataset all the dates that do not appear in the smaller one (and the other way around). That way, you...
Hi, I hope all is well. While fastLink calculates the similarity measures for the observations in the cross-product of two datasets, such numbers get recycled as this is the most...
No problem! I think this matches your request when focusing on first names in our sample data. ```{r eval = TRUE, echo = TRUE, tidy=FALSE, warning=FALSE, error=FALSE, message=FALSE} ## Load...
This looks fantastic! We will take a close look at your fixes and test them during the Winter break. We are also working on some additional efficiency gains -- something...
Thanks to both of you! As @aalexandersson mentions, the problem is that you have a variable with no variation. We have warnings when there is no variation in a variable...
Hi @muranyia, No problem! Quick question: what happens if you set cut.a to a lowe value? e.g., ``` fastLink(data.table.1, data.table.2, varnames=c("FullName", "EMail"), stringdist.match=c("FullName", "EMail"), cut.a=0.90, dedupe.matches=TRUE, verbose=TRUE, return.df=TRUE, n.cores=4) ```...
@muranyia, following on @kosukeimai's suggestion, I would try to divide the emails into components e.g., username, email service, and domain. I would expect little to no typographical errors in the...
Thanks for getting back to us! Parsing the emails as discussed above might help. Quick question: When you say 20K cases, that is referring to the larger dataset, right? Let's...