vctrs
vctrs copied to clipboard
Proof of concept) ALTREP compact integer reps
CC @lionel- @romainfrancois
This is an implementation of an ALTREP compact integer rep, inspired by talking with @lionel- about this being a possible way forward for introducing efficient recycling.
The idea is that:
-
vec_recycle()
can return this when recycling a size 1 integer to size n, avoiding a recycle -
vec_assign()
knows how to work with these in a clever way, reusing that size 1 value
In the implementation, I tried just allowing all ALTREP()
objects to go through ASSIGN_INDEX_ALTREP()
, and at every iteration I was just planning on calling ELT(value, i)
(this is how R's ExtractSubset()
works). This ended up being much slower than the original method of actually doing the recycling due to all of the calls to INTEGER_ELT()
, which then further hands off to my Elt ALTREP method.
It is much more efficient to just call ELT(value, 0)
once, and then assign that repeatedly.
I'm not sure if all of this is more or less work than trying to handle the "recycling without allocating" size 1 case in vec_assign()
without trying to use altrep. It would be nice if we didnt have to touch vec_assign()
at all, but I don't think that is going to be the case.
Here are a few benchmarks. In particular, notice that the memory allocation is halved because value
is not recycled. It can generally be much faster (especially when you do it multiple times, as we'd do when they are columns in a data frame), and I don't think it is ever slower.
I'm sure there are bugs, but it seem promising
Master:
library(vctrs)
x <- 1:5000000 + 0L
idx <- 1:4000000 + 0L
value <- 1L
bench::mark(vec_assign(x, idx, value), iterations = 50)
#> # A tibble: 1 x 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 vec_assign(x, idx, value) 12.1ms 13.5ms 69.4 34.3MB 152.
bench::mark(x = {
for (i in 1:20) {
xx <- vec_assign(x, idx, value)
}
}, iterations = 20)
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
#> # A tibble: 1 x 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 x 286ms 328ms 3.10 691MB 43.5
This PR
library(vctrs)
x <- 1:5000000 + 0L
idx <- 1:4000000 + 0L
value <- 1L
bench::mark(vec_assign(x, idx, value), iterations = 50)
#> # A tibble: 1 x 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 vec_assign(x, idx, value) 9.03ms 10.1ms 97.1 19.1MB 97.1
bench::mark(x = {
for (i in 1:20) {
xx <- vec_assign(x, idx, value)
}
}, iterations = 20)
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
#> # A tibble: 1 x 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 x 223ms 234ms 4.12 385MB 26.0
Created on 2020-02-18 by the reprex package (v0.3.0)
Is x
copied in these benchmarks ?
Looks promising!
Can you also include benchmarks for vec_recycle()
please? The advantage of doing it this way is that it speeds up recycling which is called in a few other places than vec-assign.
Is x copied in these benchmarks ?
Yes because x
namedness is bumped in the .Call()
wrapper. Well actually I think we also need to change r_maybe_duplicate()
to use MAYBE_SHARED()
instead of MAYBE_REFERENCED()
. The former allows 1 reference whether the latter doesn't. When I change this in this branch, I get:
idx <- 1:4000000 + 0L
value <- 1L
x <- 1:5000000 + 0L
system.time(.Call(vctrs_assign, x, idx, value))
#> user system elapsed
#> 0.007 0.000 0.007
x <- 1:5000000 + 0L
system.time(vec_assign(x, idx, value))
#> user system elapsed
#> 0.01 0.00 0.01
Using this patch:
diff --git a/src/slice-assign.c b/src/slice-assign.c
index 751d518b..aac7b28f 100644
--- a/src/slice-assign.c
+++ b/src/slice-assign.c
@@ -92,7 +92,7 @@ SEXP vec_assign_impl(SEXP proxy, SEXP index, SEXP value, bool clone) {
R_len_t n = Rf_length(index); \
int* index_data = INTEGER(index); \
\
- SEXP out = PROTECT(clone ? Rf_shallow_duplicate(x) : x); \
+ SEXP out = PROTECT(clone ? r_maybe_duplicate(x) : x); \
CTYPE* out_data = DEREF(out); \
\
CTYPE elt; \
@@ -121,7 +121,7 @@ SEXP vec_assign_impl(SEXP proxy, SEXP index, SEXP value, bool clone) {
} \
\
const CTYPE* value_data = CONST_DEREF(value); \
- SEXP out = PROTECT(clone ? Rf_shallow_duplicate(x) : x); \
+ SEXP out = PROTECT(clone ? r_maybe_duplicate(x) : x); \
CTYPE* out_data = DEREF(out); \
\
for (R_len_t i = 0; i < n; ++i) { \
@@ -146,7 +146,7 @@ SEXP vec_assign_impl(SEXP proxy, SEXP index, SEXP value, bool clone) {
} \
\
const CTYPE* value_data = CONST_DEREF(value); \
- SEXP out = PROTECT(clone ? Rf_shallow_duplicate(x) : x); \
+ SEXP out = PROTECT(clone ? r_maybe_duplicate(x) : x); \
CTYPE* out_data = DEREF(out) + start; \
\
for (int i = 0; i < n; ++i, out_data += step, ++value_data) { \
@@ -199,7 +199,7 @@ static SEXP raw_assign(SEXP x, SEXP index, SEXP value, bool clone) {
"`value` should have been recycled to fit `x`."); \
} \
\
- SEXP out = PROTECT(clone ? Rf_shallow_duplicate(x) : x); \
+ SEXP out = PROTECT(clone ? r_maybe_duplicate(x) : x); \
\
for (R_len_t i = 0; i < n; ++i) { \
int j = index_data[i]; \
@@ -222,7 +222,7 @@ static SEXP raw_assign(SEXP x, SEXP index, SEXP value, bool clone) {
"`value` should have been recycled to fit `x`."); \
} \
\
- SEXP out = PROTECT(clone ? Rf_shallow_duplicate(x) : x); \
+ SEXP out = PROTECT(clone ? r_maybe_duplicate(x) : x); \
\
for (R_len_t i = 0; i < n; ++i, start += step) { \
SET(out, start, GET(value, i)); \
@@ -254,7 +254,7 @@ SEXP list_assign(SEXP x, SEXP index, SEXP value, bool clone) {
* [[ include("vctrs.h") ]]
*/
SEXP df_assign(SEXP x, SEXP index, SEXP value, bool clone) {
- SEXP out = PROTECT(clone ? Rf_shallow_duplicate(x) : x);
+ SEXP out = PROTECT(clone ? r_maybe_duplicate(x) : x);
R_len_t n = Rf_length(out);
for (R_len_t i = 0; i < n; ++i) {
diff --git a/src/utils.c b/src/utils.c
index f26d800a..adf703c2 100644
--- a/src/utils.c
+++ b/src/utils.c
@@ -1059,7 +1059,7 @@ bool r_is_function(SEXP x) {
}
SEXP r_maybe_duplicate(SEXP x) {
- if (MAYBE_REFERENCED(x)) {
+ if (MAYBE_SHARED(x)) {
return Rf_shallow_duplicate(x);
} else {
return x;
I'm not seeing any difference after applying the patch above. It looks like it still shallow duplicates to me?
Oh I see it if I run in the R console and not in RStudio. I forgot that RStudio keeps track of x
in a way that increments the namedness.
> vec_assign <- vctrs::vec_assign
> vctrs_assign <- vctrs:::vctrs_assign
>
> idx <- 1:4000000 + 0L
> value <- 1L
>
> x <- 1:5000000 + 0L
> system.time(.Call(vctrs_assign, x, idx, value))
Namedness: 1
not shallow duplicating!
user system elapsed
0.009 0.000 0.008
> x[2]
[1] 1
>
> x <- 1:5000000 + 0L
> system.time(vec_assign(x, idx, value))
Namedness: 7
shallow duplicating!
user system elapsed
0.012 0.000 0.012
> x[2]
[1] 2
Recycling benchmarks for integers:
Master:
library(vctrs)
x <- 1L
size <- 100000L
bench::mark(vec_recycle(x, size), iterations = 10000)
#> # A tibble: 1 x 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 vec_recycle(x, size) 22.2µs 47.7µs 17413. 402KB 119.
size <- 10000000L
bench::mark(vec_recycle(x, size), iterations = 500)
#> # A tibble: 1 x 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 vec_recycle(x, size) 4.26ms 4.61ms 214. 38.1MB 219.
vctrs_compact_rep_int
:
library(vctrs)
x <- 1L
size <- 100000L
bench::mark(vec_recycle(x, size), iterations = 10000)
#> # A tibble: 1 x 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 vec_recycle(x, size) 912ns 1.3µs 749953. 10.9KB 75.0
size <- 10000000L
bench::mark(vec_recycle(x, size), iterations = 500)
#> # A tibble: 1 x 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 vec_recycle(x, size) 1.23µs 1.31µs 543067. 0B 0
This PR now also includes vctrs_compact_rep_dbl
. It was relatively straightforward with the integer infrastructure in place.
Recycling benchmarks for doubles:
Master:
library(vctrs)
x <- 1
size <- 100000L
bench::mark(vec_recycle(x, size), iterations = 10000)
#> # A tibble: 1 x 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 vec_recycle(x, size) 44.7µs 95.3µs 9463. 792KB 130.
size <- 10000000L
bench::mark(vec_recycle(x, size), iterations = 500)
#> # A tibble: 1 x 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 vec_recycle(x, size) 8.64ms 9.16ms 107. 76.3MB 152.
library(vctrs)
x <- 1
size <- 100000L
bench::mark(vec_recycle(x, size), iterations = 10000)
#> # A tibble: 1 x 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 vec_recycle(x, size) 893ns 1.18µs 803866. 10.9KB 80.4
size <- 10000000L
bench::mark(vec_recycle(x, size), iterations = 500)
#> # A tibble: 1 x 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 vec_recycle(x, size) 1.12µs 1.21µs 579915. 0B 0
So with this change, recycling consistently takes ~1 µs. How small does size
have to be before this is slower than without ALTREP?
It looks like the call to vec_recycle()
itself is always faster or as fast. Recycling an integer of size 1 to size 2 is the fastest thing I can think of (doubles take longer), and it is still the same speed. Of course, you eventually have to materialize the ALTREP vector later on, which takes time
library(vctrs)
x <- 1L
size <- 2L
# master
bench::mark(vec_recycle(x, size), iterations = 10000)
#> # A tibble: 1 x 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 vec_recycle(x, size) 1.24µs 1.55µs 617904. 10.9KB 124.
# this PR
bench::mark(vec_recycle(x, size), iterations = 10000)
#> # A tibble: 1 x 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 vec_recycle(x, size) 1.1µs 1.43µs 675874. 10.9KB 135.
Created on 2020-02-19 by the reprex package (v0.3.0)
I have pushed the special vec_recycle()
ALTREP code further down to vec_slice_impl()
. This is interesting because we now also get vec_init()
generating ALTREP reps as well.
Really anything that slices with our older compact_rep()
index is going to create an ALTREP rep now (that probably needs a new name to be less confusing).
By pushing it down into vec_slice_impl()
, it means we can recycle data frames cheaply too (that didn't work before).
library(vctrs)
df <- data.frame(x = 1, y = 2L)
size <- 1e6L
# before
bench::mark(vec_init(df, size), iterations = 100)
#> # A tibble: 1 x 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 vec_init(df, size) 1.25ms 1.5ms 493. 11.5MB 455.
# after
bench::mark(vec_init(df, size), iterations = 100)
#> # A tibble: 1 x 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 vec_init(df, size) 7.25µs 10.2µs 97417. 84KB 78.0
# before
bench::mark(vec_recycle(df, size), iterations = 100)
#> # A tibble: 1 x 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 vec_recycle(df, size) 1.24ms 1.45ms 632. 11.4MB 607.
# after
bench::mark(vec_recycle(df, size), iterations = 100)
#> # A tibble: 1 x 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 vec_recycle(df, size) 2.04µs 2.85µs 297952. 0B 29.8
Nice!
We now have support for the 4 core types: integer, double, logical, and character!
Logicals are a bit tricky, as altlogical, altraw and altcomplex support was only added in R 3.6.0. So we have to have an extra layer of separation between that and the 3.5.0 ALTREP methods. But I don't think it looks too bad, and it should be easy to add raw and complex support with the way ive set it up.
@lionel- I'm curious about how you would tackle the code repetition between the methods. The Extract_subset
methods, for example, are essentially all the same and can be templated out with a macro. I was thinking about adding a altrep-rep-macros.h
file where I could hold the macros, and then each of the 4 type files would use them in their corresponding methods.
Pros of this are that we ensure that all 4 Extract_subset
methods are aligned, and if we ever have to update them then we do it in one place.
Cons are that it makes the code less readable because you'd have to go to another file to find the implementation. i.e. I'd have:
static SEXP vctrs_compact_rep_chr_Extract_subset(SEXP x, SEXP indx, SEXP call) {
VCTRS_COMPACT_REP_EXTRACT_SUBSET(CHR, STRSXP, SEXP, STRING_PTR, NA_STRING);
}
With the definition of VCTRS_COMPACT_REP_EXTRACT_SUBSET
in altrep-rep-macros.h
.
Another con is that if you just look at VCTRS_COMPACT_REP_EXTRACT_SUBSET
, you'll see references to x
and indx
and would have to look back to the caller of the macro to understand that they were arguments to the function. Maybe that can be solved by giving X
and SUBSCRIPT
parameters to the macro
@davisvaughan does this automatically flow through to the S3 vectors we’re creating? (eg factors)
@hadley, no, right now vec_slice_impl()
(and therefore vec_init()
and vec_recycle()
) is restricted to only generate a compact rep if the object is unclassed and has no attributes whatsoever.
I haven't fully thought through the implications of dealing with attributes, so I thought this would be safest for now, while still providing decent benefits
Here is the slicing function that does it https://github.com/r-lib/vctrs/pull/837/files#diff-f81a9e077e19f5a30adaa562ceddadbeR495-R534
Shouldn't attribute handling be orthogonal to altrep features? The memory representation of supported base types should be completely opaque to our handling of types right?
So if an S3 class has a proxy, possibily with attributes as factors would, we should be able to create an altrep repetition of the proxy, which will get restored later on? If the restore method does not cause a materialisation, the class should benefit from the altrep.
It would be nice if unspecified vectors created with vec_init()
were implemented as altrep repetions.
Cons are that it makes the code less readable because you'd have to go to another file to find the implementation.
I don't think that's much of a downside considering jump-to-definition. I like the idea of making them macros in a separate header file.
Aside: Do we need to use Uppercase for altrep methods? We use snake_case everywhere else.
This is a useful draft, but we probably would never merge it as is and will want to build it from scratch some time in the future. We can come back to use it as a guide when we do so