widyr icon indicating copy to clipboard operation
widyr copied to clipboard

Any way to save more column when carrying pairwise_ function?

Open hope-data-science opened this issue 4 years ago • 0 comments

I've been benefited from widyr::pairwise_count for years. It is really fast, however, recently I need to get all the combinations within the group and I tried use it again, but this time I want to keep the group ID. Usually, I would mutate a new id (named "id2" usually) and group by this new column, and then use pairwise_count. But it is really slow! Let me give an example:

> library(dplyr)
> dat <- tibble(group = rep(1:5, each = 2),
+                   letter = c("a", "b",
+                              "a", "c",
+                              "a", "c",
+                              "b", "e",
+                              "b", "f"))
> 
> # count the number of times two letters appear together
> pairwise_count(dat, letter, group)

# A tibble: 8 x 3
  item1 item2     n
  <chr> <chr> <dbl>
1 b     a         1
2 c     a         2
3 a     b         1
4 e     b         1
5 f     b         1
6 a     c         2
7 b     e         1
8 b     f         1

Any way I could get the group number? Just like below:

library(dplyr)
library(widyr)

dat <- tibble(group = rep(1:5, each = 2),
                  letter = c("a", "b",
                             "a", "c",
                             "a", "c",
                             "b", "e",
                             "b", "f"))

dat %>% 
  mutate(group2 = group) %>% 
  group_by(group) %>% 
  pairwise_count(letter,group2) %>% 
  ungroup()

# A tibble: 10 x 4
   group item1 item2     n
   <int> <chr> <chr> <dbl>
 1     1 b     a         1
 2     1 a     b         1
 3     2 c     a         1
 4     2 a     c         1
 5     3 c     a         1
 6     3 a     c         1
 7     4 e     b         1
 8     4 b     e         1
 9     5 f     b         1
10     5 b     f         1

But it is rather slow when there are more groups, any solutions to make it faster? Thanks.

hope-data-science avatar Jan 09 '20 14:01 hope-data-science