fuzzyjoin
fuzzyjoin copied to clipboard
Naming `distance_col` when matching along multiple variables
I'm experimenting with matching along n variables (ex x1
and x2
) and want to keep track of the distance for each variable (distance_col = "distance"
). You can do this, but the data frame creates n + 1 variables, a distance measure for each variable with the corresponding prefix (x1.distance
) and an original distance measure distance
that is only NA's. It would be nice if this were dropped automatically.
library(tidyverse)
library(fuzzyjoin)
ex_1 <- tibble(
x1 = c("how", "now", "brown", "cow"),
x2 = c("what", "do", "I", "know")
)
ex_2 <- tibble(
x1 = c("hw", "nw", "brwn", "cw"),
x2 = c("wht", "d", "I", "knw")
)
stringdist_inner_join(ex_1, ex_2, by = c("x1", "x2"),
method = "lv",
distance_col = "distance")