reclin
reclin copied to clipboard
multiple similarities on the same column
In some cases people may want multiple similarities type on the same column. This does not seem to be supported well on first glance.
I was using the following function to generate multiple similarities across the same variable, but it seems like the result interacts poorly with score_simsum
later in the evaluation. support for multiple similarity scores across the same column would allow techniques like problink_em
and other [un]supervised techniques to find the best similarity metric score across multiple columns, which may be different.
#'
#'
compare_pairs_multi <- function(p,
by,
default_comparators = list("lcs" = lcs(), "jw" = jaro_winkler()),
...){
bind_cols(
p,
names(default_comparators) %>%
map_dfc(function(comp_nm){
p %>% compare_pairs(by = by,
default_comparator = default_comparators[[comp_nm]], ...) %>%
select(by) %>%
rename_all(~paste0(.x, "_", comp_nm))
})
)
}