fuzzyjoin icon indicating copy to clipboard operation
fuzzyjoin copied to clipboard

Naming `distance_col` when matching along multiple variables

Open spspitze opened this issue 2 years ago • 0 comments

I'm experimenting with matching along n variables (ex x1 and x2) and want to keep track of the distance for each variable (distance_col = "distance"). You can do this, but the data frame creates n + 1 variables, a distance measure for each variable with the corresponding prefix (x1.distance) and an original distance measure distance that is only NA's. It would be nice if this were dropped automatically.

library(tidyverse)
library(fuzzyjoin)

ex_1 <- tibble(
  x1 = c("how", "now", "brown", "cow"),
  x2 = c("what", "do", "I", "know")
)

ex_2 <- tibble(
  x1 = c("hw", "nw", "brwn", "cw"),
  x2 = c("wht", "d", "I", "knw")
)

stringdist_inner_join(ex_1, ex_2, by = c("x1", "x2"),
                      method = "lv",
                      distance_col = "distance")

spspitze avatar Mar 09 '22 21:03 spspitze