disk.frame icon indicating copy to clipboard operation
disk.frame copied to clipboard

warnings when using new github dtplyr

Open kendonB opened this issue 5 years ago • 10 comments

library(disk.frame)
library(dtplyr)
library(tidyverse)
iris_df = as.disk.frame(iris)
iris_df %>% 
  filter(Sepal.Length > 7) %>% 
  collect()
#> Warning: You are using a dplyr method on a raw data.table, which will call
#> the data frame implementation, and is likely to be inefficient.
#> 
#> To suppress this message, either generate a data.table translation
#> with `lazy_dt()` or convert to a data frame or tibble with
#> `as.data.frame()`/`as_tibble()`.

#> Warning: You are using a dplyr method on a raw data.table, which will call
#> the data frame implementation, and is likely to be inefficient.
#> 
#> To suppress this message, either generate a data.table translation
#> with `lazy_dt()` or convert to a data frame or tibble with
#> `as.data.frame()`/`as_tibble()`.

#> Warning: You are using a dplyr method on a raw data.table, which will call
#> the data frame implementation, and is likely to be inefficient.
#> 
#> To suppress this message, either generate a data.table translation
#> with `lazy_dt()` or convert to a data frame or tibble with
#> `as.data.frame()`/`as_tibble()`.

#> Warning: You are using a dplyr method on a raw data.table, which will call
#> the data frame implementation, and is likely to be inefficient.
#> 
#> To suppress this message, either generate a data.table translation
#> with `lazy_dt()` or convert to a data frame or tibble with
#> `as.data.frame()`/`as_tibble()`.

#> Warning: You are using a dplyr method on a raw data.table, which will call
#> the data frame implementation, and is likely to be inefficient.
#> 
#> To suppress this message, either generate a data.table translation
#> with `lazy_dt()` or convert to a data frame or tibble with
#> `as.data.frame()`/`as_tibble()`.

#> Warning: You are using a dplyr method on a raw data.table, which will call
#> the data frame implementation, and is likely to be inefficient.
#> 
#> To suppress this message, either generate a data.table translation
#> with `lazy_dt()` or convert to a data frame or tibble with
#> `as.data.frame()`/`as_tibble()`.
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
#> 1           7.1         3.0          5.9         2.1 virginica
#> 2           7.6         3.0          6.6         2.1 virginica
#> 3           7.3         2.9          6.3         1.8 virginica
#> 4           7.2         3.6          6.1         2.5 virginica
#> 5           7.7         3.8          6.7         2.2 virginica
#> 6           7.7         2.6          6.9         2.3 virginica
#> 7           7.7         2.8          6.7         2.0 virginica
#> 8           7.2         3.2          6.0         1.8 virginica
#> 9           7.2         3.0          5.8         1.6 virginica
#> 10          7.4         2.8          6.1         1.9 virginica
#> 11          7.9         3.8          6.4         2.0 virginica
#> 12          7.7         3.0          6.1         2.3 virginica

Created on 2019-09-24 by the reprex package (v0.3.0)

Session info
devtools::session_info()
#> - Session info ----------------------------------------------------------
#>  setting  value                       
#>  version  R version 3.6.1 (2019-07-05)
#>  os       Windows 10 x64              
#>  system   x86_64, mingw32             
#>  ui       RTerm                       
#>  language (EN)                        
#>  collate  English_United States.1252  
#>  ctype    English_United States.1252  
#>  tz       Pacific/Auckland            
#>  date     2019-09-24                  
#> 
#> - Packages --------------------------------------------------------------
#>  package         * version    date       lib
#>  assertthat        0.2.1      2019-03-21 [1]
#>  backports         1.1.4      2019-04-10 [1]
#>  benchmarkme       1.0.2      2019-08-19 [1]
#>  benchmarkmeData   1.0.2      2019-08-19 [1]
#>  bigreadr          0.1.10     2019-09-17 [1]
#>  bit               1.1-14     2018-05-29 [1]
#>  bit64             0.9-7      2017-05-08 [1]
#>  broom             0.5.2      2019-04-07 [1]
#>  callr             3.2.0      2019-03-15 [1]
#>  cellranger        1.1.0      2016-07-27 [1]
#>  cli               1.1.0      2019-03-19 [1]
#>  codetools         0.2-16     2018-12-24 [2]
#>  colorspace        1.4-1      2019-03-18 [1]
#>  crayon            1.3.4      2017-09-16 [1]
#>  data.table        1.12.2     2019-04-07 [1]
#>  desc              1.2.0      2018-05-01 [1]
#>  devtools          2.0.2      2019-04-08 [1]
#>  digest            0.6.21     2019-09-20 [1]
#>  disk.frame      * 0.1.1.999  2019-09-24 [1]
#>  doParallel        1.0.15     2019-08-02 [1]
#>  dplyr           * 0.8.3      2019-07-04 [1]
#>  dtplyr          * 0.0.3.9000 2019-09-24 [1]
#>  evaluate          0.13       2019-02-12 [1]
#>  forcats         * 0.4.0      2019-02-17 [1]
#>  foreach           1.4.7      2019-07-27 [1]
#>  fs                1.3.1      2019-05-06 [1]
#>  fst               0.9.0      2019-04-09 [1]
#>  furrr             0.1.0      2018-05-16 [1]
#>  future            1.14.0     2019-07-02 [1]
#>  future.apply      1.3.0      2019-06-18 [1]
#>  generics          0.0.2      2018-11-29 [1]
#>  ggplot2         * 3.1.1      2019-04-07 [1]
#>  globals           0.12.4     2018-10-11 [1]
#>  glue              1.3.1      2019-03-12 [1]
#>  gtable            0.3.0      2019-03-25 [1]
#>  haven             2.1.0      2019-02-19 [1]
#>  highr             0.8        2019-03-20 [1]
#>  hms               0.4.2      2018-03-10 [1]
#>  htmltools         0.3.6      2017-04-28 [1]
#>  httr              1.4.1      2019-08-05 [1]
#>  iterators         1.0.12     2019-07-26 [1]
#>  jsonlite          1.6        2018-12-07 [1]
#>  knitr             1.23       2019-05-18 [1]
#>  lattice           0.20-38    2018-11-04 [2]
#>  lazyeval          0.2.2      2019-03-15 [1]
#>  listenv           0.7.0      2018-01-21 [1]
#>  lubridate         1.7.4      2018-04-11 [1]
#>  magrittr          1.5        2014-11-22 [1]
#>  Matrix            1.2-17     2019-03-22 [2]
#>  memoise           1.1.0      2017-04-21 [1]
#>  modelr            0.1.4      2019-02-18 [1]
#>  munsell           0.5.0      2018-06-12 [1]
#>  nlme              3.1-140    2019-05-12 [2]
#>  pillar            1.4.2      2019-06-29 [1]
#>  pkgbuild          1.0.3      2019-03-20 [1]
#>  pkgconfig         2.0.3      2019-09-22 [1]
#>  pkgload           1.0.2      2018-10-29 [1]
#>  plyr              1.8.4      2016-06-08 [1]
#>  prettyunits       1.0.2      2015-07-13 [1]
#>  processx          3.3.1      2019-05-08 [1]
#>  pryr              0.1.4      2018-02-18 [1]
#>  ps                1.3.0      2018-12-21 [1]
#>  purrr           * 0.3.2      2019-03-15 [1]
#>  R6                2.4.0      2019-02-14 [1]
#>  Rcpp              1.0.2      2019-07-25 [1]
#>  readr           * 1.3.1      2018-12-21 [1]
#>  readxl            1.3.1      2019-03-13 [1]
#>  remotes           2.0.4      2019-04-10 [1]
#>  rlang             0.4.0      2019-06-25 [1]
#>  rmarkdown         1.12       2019-03-14 [1]
#>  rprojroot         1.3-2      2018-01-03 [1]
#>  rvest             0.3.4      2019-05-15 [1]
#>  scales            1.0.0      2018-08-09 [1]
#>  sessioninfo       1.1.1      2018-11-05 [1]
#>  stringi           1.4.3      2019-03-12 [1]
#>  stringr         * 1.4.0      2019-02-10 [1]
#>  testthat          2.1.1      2019-04-23 [1]
#>  tibble          * 2.1.3      2019-06-06 [1]
#>  tidyr           * 0.8.3      2019-03-01 [1]
#>  tidyselect        0.2.5      2018-10-11 [1]
#>  tidyverse       * 1.2.1      2017-11-14 [1]
#>  usethis           1.5.0      2019-04-07 [1]
#>  withr             2.1.2      2018-03-15 [1]
#>  xfun              0.7        2019-05-14 [1]
#>  xml2              1.2.0      2018-01-24 [1]
#>  yaml              2.2.0      2018-07-25 [1]
#>  source                               
#>  CRAN (R 3.6.0)                       
#>  CRAN (R 3.6.0)                       
#>  CRAN (R 3.6.1)                       
#>  CRAN (R 3.6.1)                       
#>  CRAN (R 3.6.1)                       
#>  CRAN (R 3.6.0)                       
#>  CRAN (R 3.6.0)                       
#>  CRAN (R 3.6.0)                       
#>  CRAN (R 3.6.0)                       
#>  CRAN (R 3.6.0)                       
#>  CRAN (R 3.6.0)                       
#>  CRAN (R 3.6.1)                       
#>  CRAN (R 3.6.0)                       
#>  CRAN (R 3.6.0)                       
#>  CRAN (R 3.6.0)                       
#>  CRAN (R 3.6.0)                       
#>  CRAN (R 3.6.0)                       
#>  CRAN (R 3.6.1)                       
#>  Github (xiaodaigh/disk.frame@0883715)
#>  CRAN (R 3.6.1)                       
#>  CRAN (R 3.6.1)                       
#>  Github (tidyverse/dtplyr@4d8d6da)    
#>  CRAN (R 3.6.0)                       
#>  CRAN (R 3.6.0)                       
#>  CRAN (R 3.6.1)                       
#>  CRAN (R 3.6.0)                       
#>  CRAN (R 3.6.0)                       
#>  CRAN (R 3.6.0)                       
#>  CRAN (R 3.6.1)                       
#>  CRAN (R 3.6.1)                       
#>  CRAN (R 3.6.0)                       
#>  CRAN (R 3.6.0)                       
#>  CRAN (R 3.6.0)                       
#>  CRAN (R 3.6.0)                       
#>  CRAN (R 3.6.0)                       
#>  CRAN (R 3.6.0)                       
#>  CRAN (R 3.6.0)                       
#>  CRAN (R 3.6.0)                       
#>  CRAN (R 3.6.0)                       
#>  CRAN (R 3.6.1)                       
#>  CRAN (R 3.6.1)                       
#>  CRAN (R 3.6.0)                       
#>  CRAN (R 3.6.0)                       
#>  CRAN (R 3.6.1)                       
#>  CRAN (R 3.6.0)                       
#>  CRAN (R 3.6.0)                       
#>  CRAN (R 3.6.0)                       
#>  CRAN (R 3.6.0)                       
#>  CRAN (R 3.6.1)                       
#>  CRAN (R 3.6.0)                       
#>  CRAN (R 3.6.0)                       
#>  CRAN (R 3.6.0)                       
#>  CRAN (R 3.6.1)                       
#>  CRAN (R 3.6.1)                       
#>  CRAN (R 3.6.0)                       
#>  CRAN (R 3.6.1)                       
#>  CRAN (R 3.6.0)                       
#>  CRAN (R 3.6.0)                       
#>  CRAN (R 3.6.0)                       
#>  CRAN (R 3.6.0)                       
#>  CRAN (R 3.6.0)                       
#>  CRAN (R 3.6.0)                       
#>  CRAN (R 3.6.0)                       
#>  CRAN (R 3.6.0)                       
#>  CRAN (R 3.6.1)                       
#>  CRAN (R 3.6.0)                       
#>  CRAN (R 3.6.0)                       
#>  CRAN (R 3.6.0)                       
#>  CRAN (R 3.6.1)                       
#>  CRAN (R 3.6.0)                       
#>  CRAN (R 3.6.0)                       
#>  CRAN (R 3.6.0)                       
#>  CRAN (R 3.6.0)                       
#>  CRAN (R 3.6.0)                       
#>  CRAN (R 3.6.0)                       
#>  CRAN (R 3.6.0)                       
#>  CRAN (R 3.6.0)                       
#>  CRAN (R 3.6.1)                       
#>  CRAN (R 3.6.0)                       
#>  CRAN (R 3.6.0)                       
#>  CRAN (R 3.6.0)                       
#>  CRAN (R 3.6.0)                       
#>  CRAN (R 3.6.0)                       
#>  CRAN (R 3.6.0)                       
#>  CRAN (R 3.6.0)                       
#>  CRAN (R 3.6.0)                       
#> 
#> [1] C:/Users/kmbel/Documents/R/win-library/3.6
#> [2] C:/Program Files/R/R-3.6.1/library

kendonB avatar Sep 24 '19 03:09 kendonB

Thanks for this! I turned off dtplyr support before the release of v0.1.0 as there were many cases where dtplyr didn't work. I will look to find a solution for this as I think lazy_dt is a new API.

Here is a work around which is painful

# this is the way to get around it
aa = iris_df %>% 
  map(~{
    dtplyr::lazy_dt(.x) %>% 
      filter(Sepal.Length > 7) %>% 
      collect()
  }) %>% 
  collect

I will need to think about a good way to incorporate dtplyr which was in the original design. Also keen to start work on this once the new dtplyr is on CRAN

xiaodaigh avatar Sep 24 '19 04:09 xiaodaigh

I don't think you necessarily need to support dtplyr to address the current issue. The warnings seem to come about because you use data.table in the background then call dplyr verbs. You can probably fix this issue by converting to data.frame before running the dplyr functions.

kendonB avatar Sep 24 '19 04:09 kendonB

I see. Good point. But the error only appears if you load dtplyr, so I take it to mean that if you turn on dtplyr then that's what you want to use, instead of converting to data.frame first? I think I should support dtplyr once it's on CRAN anyway, as a solidarity measure between tidyverse and data.table. :)

xiaodaigh avatar Sep 24 '19 04:09 xiaodaigh

I don't think you should assume that the user wants to use dtplyr just because it's loaded. The interface would ideally be as close to the in-memory interface as possible.

i.e. if the user were to call lazy_dt on the disk.frame object first, then I'd go ahead and call lazy_dt on the data.frame objects once they're in memory (once dtplyr is on CRAN). Otherwise, I wouldn't use data.table at all unless you have a really good reason. Not everything is faster in data.table; left_join, for example, I find is much better than the equivalent data.table merge.

kendonB avatar Sep 24 '19 04:09 kendonB

Alright. Implementing a lazy_dt sounds reasonable because it's close to the dtplyr syntax.

xiaodaigh avatar Sep 24 '19 04:09 xiaodaigh

You might also be able to get them to change lazy_dt to a generic if you are quick!

kendonB avatar Sep 24 '19 05:09 kendonB

Good thinking! See https://github.com/tidyverse/dtplyr/issues/105

xiaodaigh avatar Sep 24 '19 05:09 xiaodaigh

Still need to implement lazy_dt. But wait for new dtplyr to go on CRAN first.

xiaodaigh avatar Sep 24 '19 06:09 xiaodaigh

The latest disk.frame github version got rid of the warnings. But still need to do the lazy_dt implementation at some point.

xiaodaigh avatar Sep 24 '19 06:09 xiaodaigh

You might also be able to get them to change lazy_dt to a generic if you are quick!

Mr Hadley had closed the issue and won't fix. I don't follow the logic exactly, but I don't feel like arguing. I think they are busy enough. I will figure out a way to accommodate.

xiaodaigh avatar Sep 26 '19 14:09 xiaodaigh