multidplyr copied to clipboard
Implement `distinct`
I have a problem with calculations on few cores using multidplyr in R. I have a data to which i give a number (data will be grouped by number and data with number 1 will be sens to cluster 1 etc.) like in code below:
group <- rep(1:cores, length.out = nrow(dane))
dane <- bind_cols(tibble(group), dane)
cluster <- multidplyr::new_cluster(cores)
dane <-
dane %>%
group_by(group) %>%
Also, I send to each cluster which will be calculating library, other values and functions.
After data is split and send to cluster I want to start calculations and collect results:
dane %>% select() %>% distinct() %>% ...
but unfortunatelly I have this error and I don't know what to do to solve this problem [instead of distinct(), I use unique but other error show.]
"Error in command 'UseMethod ("distinct")': inapplicable method for 'distinct' applied to the class object "multidplyr_party_df""
Can you please provide a minimal reprex (reproducible example)? The goal of a reprex is to make it as easy as possible for me to recreate your problem so that I can fix it: please help me help you! If you've never heard of a reprex before, start by reading about the reprex package, including the advice further down the page. Please make sure your reprex is created with the reprex package as it gives nicely formatted output and avoids a number of common pitfalls.
Dear Hadley, unfortunatelly reprex() gives strange errors when I try to make example code
library(dplyr, warn.conflicts = FALSE)
numCores <- detectCores()
#> Error in detectCores(): nie udało się znaleźć funkcji 'detectCores'
cores <- numCores - 4
#> Error in eval(expr, envir, enclos): nie znaleziono obiektu 'numCores'
group <- rep(1:cores, length.out = nrow(flights))
#> Error in eval(expr, envir, enclos): nie znaleziono obiektu 'cores'
flights <- bind_cols(tibble(group), flights)
#> Error in eval_tidy(xs[[j]], mask): nie znaleziono obiektu 'group'
cluster <- multidplyr::new_cluster(cores)
#> Error in integer(n): nie znaleziono obiektu 'cores'
flights <-
+ flights %>%
+ group_by(group) %>%
+ partition(cluster)
#> Error in FUN(left): niepoprawny argument przekazany do operatora jednoargumentowego
#> Error in is_cluster(cluster): nie znaleziono obiektu 'cluster'
#> Error in is_cluster(cluster): nie znaleziono obiektu 'cluster'
#> Error in is_cluster(cluster): nie znaleziono obiektu 'cluster'
cluster_copy(cluster, 'flights')
#> Error in is_cluster(cluster): nie znaleziono obiektu 'cluster'
flights <-
+ flights %>%
+ select(contains("dest"), everything()) %>%
+ select(`ID`=1, group = 2, abstract=3) %>%
+ distinct()
#> Error in FUN(left): niepoprawny argument przekazany do operatora jednoargumentowego
So I paste normal code with data which is available for everyone (from package nycflights13) and gives the same error as in my situation:
library(dplyr, warn.conflicts = FALSE)
numCores <- detectCores()
cores <- numCores - 4
group <- rep(1:cores, length.out = nrow(flights))
flights <- bind_cols(tibble(group), flights)
cluster <- multidplyr::new_cluster(cores)
flights <-
flights %>%
group_by(group) %>%
cluster_copy(cluster, 'flights')
flights <-
flights %>%
select(contains("dest"), everything()) %>%
select(`dest`=1, group = 2, origin=3) %>%
distinct() %>%
When You put this code into Rstudio console and run it You will have error like this: Error in command 'UseMethod ("distinct")': inapplicable method for 'distinct' applied to the class object "multidplyr_party_df"
Here is a minimal reprex:
library(dplyr, warn.conflicts = FALSE)
cluster <- multidplyr::new_cluster(2)
mtcars2 <- partition(mtcars, cluster)
mtcars2 %>% distinct()
#> Error in UseMethod("distinct"): no applicable method for 'distinct' applied to an object of class "multidplyr_party_df"
Created on 2021-05-21 by the reprex package (v2.0.0)
Looks like I forgot to provide a distinct method.
Dear Hadley, Now I understand the error - now the question is: will You in the nearest future add this method distinct() to package multidplyr or how can I add this method in my code?
I will add it next time I work on multidplyr.
family has the same issue.
Any chance one of you came up with a fix for this ?