reclin icon indicating copy to clipboard operation
reclin copied to clipboard

multiple similarities on the same column

Open hswerdfe opened this issue 3 years ago • 0 comments

In some cases people may want multiple similarities type on the same column. This does not seem to be supported well on first glance.

I was using the following function to generate multiple similarities across the same variable, but it seems like the result interacts poorly with score_simsum later in the evaluation. support for multiple similarity scores across the same column would allow techniques like problink_em and other [un]supervised techniques to find the best similarity metric score across multiple columns, which may be different.

#'
#'
compare_pairs_multi <- function(p, 
                                by, 
                                default_comparators = list("lcs" = lcs(), "jw" = jaro_winkler()), 
                                ...){
    bind_cols(
        p,  
        names(default_comparators) %>% 
            map_dfc(function(comp_nm){
                  p %>% compare_pairs(by = by, 
                          default_comparator = default_comparators[[comp_nm]], ...)  %>%
                          select(by) %>% 
                          rename_all(~paste0(.x, "_", comp_nm))
            })
    )
}

hswerdfe avatar Apr 14 '21 13:04 hswerdfe