reclin
reclin copied to clipboard
Keeping weights following final linkage
Hi there, I am still learning reclin functionality but have been pretty happy with this package so thank you so much for your work and making this utility available.
I am working on validating the linkage and proper threshold and in so doing trying to retain the weights but for some reason i am losing them and just looking for confirmation that the approach i am taking should work. When i join the Linked.. data frame with the P_Links_Att.. data frame on Id.x and Id.y i would expect this to give me all the weights onto that linked data set up the vast majority of the Linked.. records don't get a weight and looking into the p object and associated P_Link_Atts.. data frame there are many linkages shown in the Linked.. dataframe that are not in the weights.
My presumption is that all the x and y values are row names so create separate columns titled "Id.x" and "Id.y" as joining vectors but maybe thats where i am going wrong.
My goal is to just be able to retain the weights values after applying the link() function so i can check how my linkage does based on weight so i can adjust. Thanks for any help and hope this issue is clear. Sorry to not be able to supply data but its filled with PII but if a more workable example is necessary i can build some vignette data.
#Blocking
p <- pair_blocking(Select_Ems..,Select_Partic.., c("County","Crash_Week"), large = FALSE)
#Compare the records on their linkage keys - basic
#p <- compare_pairs(p, by = c("First_Name","Middle_Initial","Last_Name","DOB","Sex"))
#Compare using Jaro-Winkler
#p <- compare_pairs(p, by = c("First_Name","Middle_Initial","Last_Name","DOB","Sex","Crash_Date"), default_comparator =
jaro_winkler(0.9), overwrite = TRUE)
p <- compare_pairs(p, by =
c("First_Name","Middle_Initial","Last_Name","DOB_Day","DOB_Month","DOB_Year","Sex","Crash_Date"),
default_comparator = jaro_winkler(0.9), overwrite = TRUE)
#Force 1 to 1 linkage
p_4 <- select_n_to_m(p, "weight", var = "ntom", threshold = 2.2)
#Keep only links with x id
Linked.. <- link(p_4, all_x=TRUE, all_y = FALSE)
#Create a data frame object of linked data attributes
P_Link_Atts.. <- as.data.frame(p) %>% mutate(Id.x = as.character(x), Id.y = as.character(y))
#Join weights
Linked.. <- left_join(Linked.., P_Link_Atts.., by = c("Id.x","Id.y"))