reclin icon indicating copy to clipboard operation
reclin copied to clipboard

Keeping weights following final linkage

Open JoshRoll opened this issue 3 years ago • 0 comments

Hi there, I am still learning reclin functionality but have been pretty happy with this package so thank you so much for your work and making this utility available.

I am working on validating the linkage and proper threshold and in so doing trying to retain the weights but for some reason i am losing them and just looking for confirmation that the approach i am taking should work. When i join the Linked.. data frame with the P_Links_Att.. data frame on Id.x and Id.y i would expect this to give me all the weights onto that linked data set up the vast majority of the Linked.. records don't get a weight and looking into the p object and associated P_Link_Atts.. data frame there are many linkages shown in the Linked.. dataframe that are not in the weights.

My presumption is that all the x and y values are row names so create separate columns titled "Id.x" and "Id.y" as joining vectors but maybe thats where i am going wrong.

My goal is to just be able to retain the weights values after applying the link() function so i can check how my linkage does based on weight so i can adjust. Thanks for any help and hope this issue is clear. Sorry to not be able to supply data but its filled with PII but if a more workable example is necessary i can build some vignette data.

      #Blocking
	p <- pair_blocking(Select_Ems..,Select_Partic.., c("County","Crash_Week"), large = FALSE)
	
       #Compare the records on their linkage keys - basic
	#p <- compare_pairs(p, by = c("First_Name","Middle_Initial","Last_Name","DOB","Sex"))
	#Compare using Jaro-Winkler
	#p <- compare_pairs(p, by = c("First_Name","Middle_Initial","Last_Name","DOB","Sex","Crash_Date"),  default_comparator = 
          jaro_winkler(0.9), overwrite = TRUE)
	p <- compare_pairs(p, by = 
             c("First_Name","Middle_Initial","Last_Name","DOB_Day","DOB_Month","DOB_Year","Sex","Crash_Date"),  
        default_comparator = jaro_winkler(0.9), overwrite = TRUE)	      

     #Force 1 to 1 linkage
       p_4 <- select_n_to_m(p, "weight", var = "ntom", threshold = 2.2)
	
     #Keep only links with x id
	Linked.. <- link(p_4, all_x=TRUE, all_y = FALSE)
     #Create a data frame object of linked data attributes 
	P_Link_Atts.. <- as.data.frame(p) %>% mutate(Id.x = as.character(x), Id.y = as.character(y))
    #Join weights 
    Linked.. <- left_join(Linked.., P_Link_Atts.., by = c("Id.x","Id.y"))

JoshRoll avatar Aug 30 '21 18:08 JoshRoll