Ted Enamorado comments

Results 46 comments of


                                            Ted Enamorado

NA values create error in getMatches()

Hi, The code has been adjusted to accommodate this knife-edge case. Please, if something else pops-out, let us know. Note that it is because of this community that we have...

blockData – Error: Vector memory exhausted (limit reached?)

Hi, To construct your blocks, you are performing exact matching on `sexe` and `date_naissance`. The problem is by loading the larger dataset, you are exhausting a lot of memory resources....

blockData – Error: Vector memory exhausted (limit reached?)

Hi, One possibility would be to remove from the larger dataset all the dates that do not appear in the smaller one (and the other way around). That way, you...

question / documentation

Hi, I hope all is well. While fastLink calculates the similarity measures for the observations in the cross-product of two datasets, such numbers get recycled as this is the most...

question / documentation

No problem! I think this matches your request when focusing on first names in our sample data. ```{r eval = TRUE, echo = TRUE, tidy=FALSE, warning=FALSE, error=FALSE, message=FALSE} ## Load...

Performance (gamma*() functions)

This looks fantastic! We will take a close look at your fixes and test them during the Winter break. We are also working on some additional efficiency gains -- something...

Col::subvec() error with some data

Thanks to both of you! As @aalexandersson mentions, the problem is that you have a variable with no variation. We have warnings when there is no variation in a variable...

Col::subvec() error with some data

Hi @muranyia, No problem! Quick question: what happens if you set cut.a to a lowe value? e.g., ``` fastLink(data.table.1, data.table.2, varnames=c("FullName", "EMail"), stringdist.match=c("FullName", "EMail"), cut.a=0.90, dedupe.matches=TRUE, verbose=TRUE, return.df=TRUE, n.cores=4) ```...

Col::subvec() error with some data

@muranyia, following on @kosukeimai's suggestion, I would try to divide the emails into components e.g., username, email service, and domain. I would expect little to no typographical errors in the...

Col::subvec() error with some data

Thanks for getting back to us! Parsing the emails as discussed above might help. Quick question: When you say 20K cases, that is referring to the larger dataset, right? Let's...