DFplyr
DFplyr copied to clipboard
Test bplyr integration
https://github.com/yonicd/bplyr
Appears to work for mutate and filter, even processing an S4 column
library(S4Vectors)
m <- mtcars[, c("cyl", "hp", "am", "gear", "disp")]
d <- as(m, "DataFrame")
d$gr <- GenomicRanges::GRanges("chrY", IRanges::IRanges(1:32, width=10))
d$gr2 <- GenomicRanges::GRanges("chrX", IRanges::IRanges(1:32, width = 10))
d$nl <- IRanges::NumericList(lapply(d$gear, function(n) round(rnorm(n), 2)))
d
#> DataFrame with 32 rows and 8 columns
#> cyl hp am gear disp
#> <numeric> <numeric> <numeric> <numeric> <numeric>
#> Mazda RX4 6 110 1 4 160
#> Mazda RX4 Wag 6 110 1 4 160
#> Datsun 710 4 93 1 4 108
#> Hornet 4 Drive 6 110 0 3 258
#> Hornet Sportabout 8 175 0 3 360
#> ... ... ... ... ... ...
#> Lotus Europa 4 113 1 5 95.1
#> Ford Pantera L 8 264 1 5 351
#> Ferrari Dino 6 175 1 5 145
#> Maserati Bora 8 335 1 5 301
#> Volvo 142E 4 109 1 4 121
#> gr gr2 nl
#> <GRanges> <GRanges> <NumericList>
#> Mazda RX4 chrY:1-10 chrX:1-10 -0.26,0.22,-1.33,...
#> Mazda RX4 Wag chrY:2-11 chrX:2-11 0.35,0.67,2.5,...
#> Datsun 710 chrY:3-12 chrX:3-12 0.47,-0.76,-1.91,...
#> Hornet 4 Drive chrY:4-13 chrX:4-13 -2.78,-1.82,0.81
#> Hornet Sportabout chrY:5-14 chrX:5-14 0.03,-1.51,1.01
#> ... ... ... ...
#> Lotus Europa chrY:28-37 chrX:28-37 0.29,1.11,-0.13,...
#> Ford Pantera L chrY:29-38 chrX:29-38 1.9,-1.43,-0.6,...
#> Ferrari Dino chrY:30-39 chrX:30-39 0.76,0.28,-0.16,...
#> Maserati Bora chrY:31-40 chrX:31-40 -0.14,0.96,1.52,...
#> Volvo 142E chrY:32-41 chrX:32-41 -0.49,0.54,-1.55,...
mutateDF <- function(.data,...){
FNS <- lapply(rlang::quos(...),rlang::quo_expr)
EXPRS <- lapply(names(FNS),function(x){
sprintf('%s <- %s',x,deparse(FNS[[x]]))
})
within(.data,eval(parse(text = paste0(unlist(EXPRS),collapse = '\n'))))
}
mutateDF(d, nl2 = 2 * nl)
#> Warning: `quo_expr()` is deprecated as of rlang 0.2.0.
#> Please use `quo_squash()` instead.
#> This warning is displayed once per session.
#> DataFrame with 32 rows and 9 columns
#> cyl hp am gear disp
#> <numeric> <numeric> <numeric> <numeric> <numeric>
#> Mazda RX4 6 110 1 4 160
#> Mazda RX4 Wag 6 110 1 4 160
#> Datsun 710 4 93 1 4 108
#> Hornet 4 Drive 6 110 0 3 258
#> Hornet Sportabout 8 175 0 3 360
#> ... ... ... ... ... ...
#> Lotus Europa 4 113 1 5 95.1
#> Ford Pantera L 8 264 1 5 351
#> Ferrari Dino 6 175 1 5 145
#> Maserati Bora 8 335 1 5 301
#> Volvo 142E 4 109 1 4 121
#> gr gr2 nl
#> <GRanges> <GRanges> <NumericList>
#> Mazda RX4 chrY:1-10 chrX:1-10 -0.26,0.22,-1.33,...
#> Mazda RX4 Wag chrY:2-11 chrX:2-11 0.35,0.67,2.5,...
#> Datsun 710 chrY:3-12 chrX:3-12 0.47,-0.76,-1.91,...
#> Hornet 4 Drive chrY:4-13 chrX:4-13 -2.78,-1.82,0.81
#> Hornet Sportabout chrY:5-14 chrX:5-14 0.03,-1.51,1.01
#> ... ... ... ...
#> Lotus Europa chrY:28-37 chrX:28-37 0.29,1.11,-0.13,...
#> Ford Pantera L chrY:29-38 chrX:29-38 1.9,-1.43,-0.6,...
#> Ferrari Dino chrY:30-39 chrX:30-39 0.76,0.28,-0.16,...
#> Maserati Bora chrY:31-40 chrX:31-40 -0.14,0.96,1.52,...
#> Volvo 142E chrY:32-41 chrX:32-41 -0.49,0.54,-1.55,...
#> nl2
#> <NumericList>
#> Mazda RX4 -0.52,0.44,-2.66,...
#> Mazda RX4 Wag 0.7,1.34,5,...
#> Datsun 710 0.94,-1.52,-3.82,...
#> Hornet 4 Drive -5.56,-3.64,1.62
#> Hornet Sportabout 0.06,-3.02,2.02
#> ... ...
#> Lotus Europa 0.58,2.22,-0.26,...
#> Ford Pantera L 3.8,-2.86,-1.2,...
#> Ferrari Dino 1.52,0.56,-0.32,...
#> Maserati Bora -0.28,1.92,3.04,...
#> Volvo 142E -0.98,1.08,-3.1,...
filterDF <- function(.data,...){
subset(.data,{
eval(rlang::quo_expr(rlang::quo(...)))
})
}
filterDF(d, lengths(nl) == 5)
#> DataFrame with 5 rows and 8 columns
#> cyl hp am gear disp
#> <numeric> <numeric> <numeric> <numeric> <numeric>
#> Porsche 914-2 4 91 1 5 120.3
#> Lotus Europa 4 113 1 5 95.1
#> Ford Pantera L 8 264 1 5 351
#> Ferrari Dino 6 175 1 5 145
#> Maserati Bora 8 335 1 5 301
#> gr gr2 nl
#> <GRanges> <GRanges> <NumericList>
#> Porsche 914-2 chrY:27-36 chrX:27-36 0.27,0.77,0.38,...
#> Lotus Europa chrY:28-37 chrX:28-37 0.29,1.11,-0.13,...
#> Ford Pantera L chrY:29-38 chrX:29-38 1.9,-1.43,-0.6,...
#> Ferrari Dino chrY:30-39 chrX:30-39 0.76,0.28,-0.16,...
#> Maserati Bora chrY:31-40 chrX:31-40 -0.14,0.96,1.52,...
Created on 2020-01-29 by the reprex package (v0.3.0)
(with dispatch, of course).
It doesn't seem to work to call the b_mutate methods internally, but maybe I'm doing something wrong. Collaboration, @yonicd?
I’ll take a look on my end
Progress... https://github.com/jonocarroll/DFplyr/tree/bplyr_integration
The README renders in the current form (including S4 columns). I haven't finished, but I found a lot of edge cases and have dealt with them.
Looks better! A few q’s (probably me not grokking)
You are importing dplyr?
Aren’t the Fn names causing ns conflicts?
If you are using base underneath why would the user want to install dplyr?
I only import the generics - without those there's no dispatch. You reclassed everything and wrote new generics but this is 'supposed' to be the way to extend a generic - write the method for a new class. Plus this way mutate works whether you pass it a data.frame or a DataFrame. My original idea was to use the tbl methods under the hood but there are glaring issues with that.
I could write new generics but that breaks dplyr if it's also attached.
Ok. The original noplyr was like that but still caused tons of ns problems. I’ll look more closely at how you did it to figure out what i did wrong there. Cheers ;)