vctrs Proof of concept) ALTREP compact integer reps

Proof of concept) ALTREP compact integer reps

Open DavisVaughan opened this issue 4 years ago • 16 comments

CC @lionel- @romainfrancois

This is an implementation of an ALTREP compact integer rep, inspired by talking with @lionel- about this being a possible way forward for introducing efficient recycling.

The idea is that:

vec_recycle() can return this when recycling a size 1 integer to size n, avoiding a recycle
vec_assign() knows how to work with these in a clever way, reusing that size 1 value

In the implementation, I tried just allowing all ALTREP() objects to go through ASSIGN_INDEX_ALTREP(), and at every iteration I was just planning on calling ELT(value, i) (this is how R's ExtractSubset() works). This ended up being much slower than the original method of actually doing the recycling due to all of the calls to INTEGER_ELT(), which then further hands off to my Elt ALTREP method.

It is much more efficient to just call ELT(value, 0) once, and then assign that repeatedly.

I'm not sure if all of this is more or less work than trying to handle the "recycling without allocating" size 1 case in vec_assign() without trying to use altrep. It would be nice if we didnt have to touch vec_assign() at all, but I don't think that is going to be the case.

Here are a few benchmarks. In particular, notice that the memory allocation is halved because value is not recycled. It can generally be much faster (especially when you do it multiple times, as we'd do when they are columns in a data frame), and I don't think it is ever slower.

I'm sure there are bugs, but it seem promising

Master:

library(vctrs)

x <- 1:5000000 + 0L
idx <- 1:4000000 + 0L

value <- 1L

bench::mark(vec_assign(x, idx, value), iterations = 50)
#> # A tibble: 1 x 6
#>   expression                     min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 vec_assign(x, idx, value)   12.1ms   13.5ms      69.4    34.3MB     152.

bench::mark(x = {
  for (i in 1:20) {
    xx <- vec_assign(x, idx, value)
  }
}, iterations = 20)
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
#> # A tibble: 1 x 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 x             286ms    328ms      3.10     691MB     43.5

This PR

library(vctrs)

x <- 1:5000000 + 0L
idx <- 1:4000000 + 0L

value <- 1L

bench::mark(vec_assign(x, idx, value), iterations = 50)
#> # A tibble: 1 x 6
#>   expression                     min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 vec_assign(x, idx, value)   9.03ms   10.1ms      97.1    19.1MB     97.1

bench::mark(x = {
  for (i in 1:20) {
    xx <- vec_assign(x, idx, value)
  }
}, iterations = 20)
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
#> # A tibble: 1 x 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 x             223ms    234ms      4.12     385MB     26.0

^{Created on 2020-02-18 by the reprex package (v0.3.0)}

Feb 19 '20 03:02 DavisVaughan

Is x copied in these benchmarks ?

Feb 19 '20 08:02 romainfrancois

Looks promising!

Can you also include benchmarks for vec_recycle() please? The advantage of doing it this way is that it speeds up recycling which is called in a few other places than vec-assign.

Is x copied in these benchmarks ?

Yes because x namedness is bumped in the .Call() wrapper. Well actually I think we also need to change r_maybe_duplicate() to use MAYBE_SHARED() instead of MAYBE_REFERENCED(). The former allows 1 reference whether the latter doesn't. When I change this in this branch, I get:

idx <- 1:4000000 + 0L
value <- 1L

x <- 1:5000000 + 0L
system.time(.Call(vctrs_assign, x, idx, value))
#>    user  system elapsed
#>   0.007   0.000   0.007

x <- 1:5000000 + 0L
system.time(vec_assign(x, idx, value))
#>    user  system elapsed
#>    0.01    0.00    0.01

Using this patch:

diff --git a/src/slice-assign.c b/src/slice-assign.c
index 751d518b..aac7b28f 100644
--- a/src/slice-assign.c
+++ b/src/slice-assign.c
@@ -92,7 +92,7 @@ SEXP vec_assign_impl(SEXP proxy, SEXP index, SEXP value, bool clone) {
   R_len_t n = Rf_length(index);                                \
   int* index_data = INTEGER(index);                            \
                                                                \
-  SEXP out = PROTECT(clone ? Rf_shallow_duplicate(x) : x);     \
+  SEXP out = PROTECT(clone ? r_maybe_duplicate(x) : x);        \
   CTYPE* out_data = DEREF(out);                                \
                                                                \
   CTYPE elt;                                                   \
@@ -121,7 +121,7 @@ SEXP vec_assign_impl(SEXP proxy, SEXP index, SEXP value, bool clone) {
   }                                                             \
                                                                 \
   const CTYPE* value_data = CONST_DEREF(value);                 \
-  SEXP out = PROTECT(clone ? Rf_shallow_duplicate(x) : x);      \
+  SEXP out = PROTECT(clone ? r_maybe_duplicate(x) : x);         \
   CTYPE* out_data = DEREF(out);                                 \
                                                                 \
   for (R_len_t i = 0; i < n; ++i) {                             \
@@ -146,7 +146,7 @@ SEXP vec_assign_impl(SEXP proxy, SEXP index, SEXP value, bool clone) {
   }                                                             \
                                                                 \
   const CTYPE* value_data = CONST_DEREF(value);                 \
-  SEXP out = PROTECT(clone ? Rf_shallow_duplicate(x) : x);      \
+  SEXP out = PROTECT(clone ? r_maybe_duplicate(x) : x);         \
   CTYPE* out_data = DEREF(out) + start;                         \
                                                                 \
   for (int i = 0; i < n; ++i, out_data += step, ++value_data) { \
@@ -199,7 +199,7 @@ static SEXP raw_assign(SEXP x, SEXP index, SEXP value, bool clone) {
              "`value` should have been recycled to fit `x`.");  \
   }                                                             \
                                                                 \
-  SEXP out = PROTECT(clone ? Rf_shallow_duplicate(x) : x);      \
+  SEXP out = PROTECT(clone ? r_maybe_duplicate(x) : x);         \
                                                                 \
   for (R_len_t i = 0; i < n; ++i) {                             \
     int j = index_data[i];                                      \
@@ -222,7 +222,7 @@ static SEXP raw_assign(SEXP x, SEXP index, SEXP value, bool clone) {
              "`value` should have been recycled to fit `x`.");  \
   }                                                             \
                                                                 \
-  SEXP out = PROTECT(clone ? Rf_shallow_duplicate(x) : x);      \
+  SEXP out = PROTECT(clone ? r_maybe_duplicate(x) : x);         \
                                                                 \
   for (R_len_t i = 0; i < n; ++i, start += step) {              \
     SET(out, start, GET(value, i));                             \
@@ -254,7 +254,7 @@ SEXP list_assign(SEXP x, SEXP index, SEXP value, bool clone) {
  * [[ include("vctrs.h") ]]
  */
 SEXP df_assign(SEXP x, SEXP index, SEXP value, bool clone) {
-  SEXP out = PROTECT(clone ? Rf_shallow_duplicate(x) : x);
+  SEXP out = PROTECT(clone ? r_maybe_duplicate(x) : x);
   R_len_t n = Rf_length(out);
 
   for (R_len_t i = 0; i < n; ++i) {
diff --git a/src/utils.c b/src/utils.c
index f26d800a..adf703c2 100644
--- a/src/utils.c
+++ b/src/utils.c
@@ -1059,7 +1059,7 @@ bool r_is_function(SEXP x) {
 }
 
 SEXP r_maybe_duplicate(SEXP x) {
-  if (MAYBE_REFERENCED(x)) {
+  if (MAYBE_SHARED(x)) {
     return Rf_shallow_duplicate(x);
   } else {
     return x;

Feb 19 '20 09:02 lionel-

I'm not seeing any difference after applying the patch above. It looks like it still shallow duplicates to me?

Feb 19 '20 14:02 DavisVaughan

Oh I see it if I run in the R console and not in RStudio. I forgot that RStudio keeps track of x in a way that increments the namedness.

> vec_assign <- vctrs::vec_assign
> vctrs_assign <- vctrs:::vctrs_assign
> 
> idx <- 1:4000000 + 0L
> value <- 1L
> 
> x <- 1:5000000 + 0L
> system.time(.Call(vctrs_assign, x, idx, value))
Namedness: 1
not shallow duplicating!
   user  system elapsed 
  0.009   0.000   0.008 
> x[2]
[1] 1
> 
> x <- 1:5000000 + 0L
> system.time(vec_assign(x, idx, value))
Namedness: 7
shallow duplicating!
   user  system elapsed 
  0.012   0.000   0.012 
> x[2]
[1] 2

Feb 19 '20 14:02 DavisVaughan

Recycling benchmarks for integers:

Master:

library(vctrs)

x <- 1L

size <- 100000L
bench::mark(vec_recycle(x, size), iterations = 10000)
#> # A tibble: 1 x 6
#>   expression                min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>           <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 vec_recycle(x, size)   22.2µs   47.7µs    17413.     402KB     119.

size <- 10000000L
bench::mark(vec_recycle(x, size), iterations = 500)
#> # A tibble: 1 x 6
#>   expression                min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>           <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 vec_recycle(x, size)   4.26ms   4.61ms      214.    38.1MB     219.

vctrs_compact_rep_int:

library(vctrs)

x <- 1L

size <- 100000L
bench::mark(vec_recycle(x, size), iterations = 10000)
#> # A tibble: 1 x 6
#>   expression                min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>           <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 vec_recycle(x, size)    912ns    1.3µs   749953.    10.9KB     75.0

size <- 10000000L
bench::mark(vec_recycle(x, size), iterations = 500)
#> # A tibble: 1 x 6
#>   expression                min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>           <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 vec_recycle(x, size)   1.23µs   1.31µs   543067.        0B        0

Feb 19 '20 20:02 DavisVaughan

This PR now also includes vctrs_compact_rep_dbl. It was relatively straightforward with the integer infrastructure in place.

Recycling benchmarks for doubles:

Master:

library(vctrs)

x <- 1

size <- 100000L
bench::mark(vec_recycle(x, size), iterations = 10000)
#> # A tibble: 1 x 6
#>   expression                min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>           <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 vec_recycle(x, size)   44.7µs   95.3µs     9463.     792KB     130.

size <- 10000000L
bench::mark(vec_recycle(x, size), iterations = 500)
#> # A tibble: 1 x 6
#>   expression                min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>           <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 vec_recycle(x, size)   8.64ms   9.16ms      107.    76.3MB     152.

library(vctrs)

x <- 1

size <- 100000L
bench::mark(vec_recycle(x, size), iterations = 10000)
#> # A tibble: 1 x 6
#>   expression                min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>           <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 vec_recycle(x, size)    893ns   1.18µs   803866.    10.9KB     80.4

size <- 10000000L
bench::mark(vec_recycle(x, size), iterations = 500)
#> # A tibble: 1 x 6
#>   expression                min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>           <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 vec_recycle(x, size)   1.12µs   1.21µs   579915.        0B        0

Feb 19 '20 20:02 DavisVaughan

So with this change, recycling consistently takes ~1 µs. How small does size have to be before this is slower than without ALTREP?

Feb 19 '20 21:02 hadley

It looks like the call to vec_recycle() itself is always faster or as fast. Recycling an integer of size 1 to size 2 is the fastest thing I can think of (doubles take longer), and it is still the same speed. Of course, you eventually have to materialize the ALTREP vector later on, which takes time

library(vctrs)

x <- 1L
size <- 2L

# master
bench::mark(vec_recycle(x, size), iterations = 10000)
#> # A tibble: 1 x 6
#>   expression                min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>           <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 vec_recycle(x, size)   1.24µs   1.55µs   617904.    10.9KB     124.

# this PR
bench::mark(vec_recycle(x, size), iterations = 10000)
#> # A tibble: 1 x 6
#>   expression                min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>           <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 vec_recycle(x, size)    1.1µs   1.43µs   675874.    10.9KB     135.

^{Created on 2020-02-19 by the reprex package (v0.3.0)}

Feb 19 '20 22:02 DavisVaughan

I have pushed the special vec_recycle() ALTREP code further down to vec_slice_impl(). This is interesting because we now also get vec_init() generating ALTREP reps as well.

Really anything that slices with our older compact_rep() index is going to create an ALTREP rep now (that probably needs a new name to be less confusing).

By pushing it down into vec_slice_impl(), it means we can recycle data frames cheaply too (that didn't work before).

library(vctrs)

df <- data.frame(x = 1, y = 2L)
size <- 1e6L

# before
bench::mark(vec_init(df, size), iterations = 100)
#> # A tibble: 1 x 6
#>   expression              min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>         <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 vec_init(df, size)   1.25ms    1.5ms      493.    11.5MB     455.

# after
bench::mark(vec_init(df, size), iterations = 100)
#> # A tibble: 1 x 6
#>   expression              min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>         <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 vec_init(df, size)   7.25µs   10.2µs    97417.      84KB     78.0

# before
bench::mark(vec_recycle(df, size), iterations = 100)
#> # A tibble: 1 x 6
#>   expression                 min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>            <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 vec_recycle(df, size)   1.24ms   1.45ms      632.    11.4MB     607.

# after
bench::mark(vec_recycle(df, size), iterations = 100)
#> # A tibble: 1 x 6
#>   expression                 min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>            <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 vec_recycle(df, size)   2.04µs   2.85µs   297952.        0B     29.8

Feb 19 '20 22:02 DavisVaughan

Nice!

Feb 20 '20 08:02 lionel-

We now have support for the 4 core types: integer, double, logical, and character!

Logicals are a bit tricky, as altlogical, altraw and altcomplex support was only added in R 3.6.0. So we have to have an extra layer of separation between that and the 3.5.0 ALTREP methods. But I don't think it looks too bad, and it should be easy to add raw and complex support with the way ive set it up.

@lionel- I'm curious about how you would tackle the code repetition between the methods. The Extract_subset methods, for example, are essentially all the same and can be templated out with a macro. I was thinking about adding a altrep-rep-macros.h file where I could hold the macros, and then each of the 4 type files would use them in their corresponding methods.

Pros of this are that we ensure that all 4 Extract_subset methods are aligned, and if we ever have to update them then we do it in one place.

Cons are that it makes the code less readable because you'd have to go to another file to find the implementation. i.e. I'd have:

static SEXP vctrs_compact_rep_chr_Extract_subset(SEXP x, SEXP indx, SEXP call) {
  VCTRS_COMPACT_REP_EXTRACT_SUBSET(CHR, STRSXP, SEXP, STRING_PTR, NA_STRING);
}

With the definition of VCTRS_COMPACT_REP_EXTRACT_SUBSET in altrep-rep-macros.h.

Another con is that if you just look at VCTRS_COMPACT_REP_EXTRACT_SUBSET, you'll see references to x and indx and would have to look back to the caller of the macro to understand that they were arguments to the function. Maybe that can be solved by giving X and SUBSCRIPT parameters to the macro

Feb 22 '20 03:02 DavisVaughan

@davisvaughan does this automatically flow through to the S3 vectors we’re creating? (eg factors)

Feb 22 '20 15:02 hadley

@hadley, no, right now vec_slice_impl() (and therefore vec_init() and vec_recycle()) is restricted to only generate a compact rep if the object is unclassed and has no attributes whatsoever.

I haven't fully thought through the implications of dealing with attributes, so I thought this would be safest for now, while still providing decent benefits

Here is the slicing function that does it https://github.com/r-lib/vctrs/pull/837/files#diff-f81a9e077e19f5a30adaa562ceddadbeR495-R534

Feb 22 '20 16:02 DavisVaughan

Shouldn't attribute handling be orthogonal to altrep features? The memory representation of supported base types should be completely opaque to our handling of types right?

So if an S3 class has a proxy, possibily with attributes as factors would, we should be able to create an altrep repetition of the proxy, which will get restored later on? If the restore method does not cause a materialisation, the class should benefit from the altrep.

Feb 24 '20 12:02 lionel-

It would be nice if unspecified vectors created with vec_init() were implemented as altrep repetions.

Feb 24 '20 12:02 lionel-

Cons are that it makes the code less readable because you'd have to go to another file to find the implementation.

I don't think that's much of a downside considering jump-to-definition. I like the idea of making them macros in a separate header file.

Aside: Do we need to use Uppercase for altrep methods? We use snake_case everywhere else.

Feb 24 '20 12:02 lionel-

This is a useful draft, but we probably would never merge it as is and will want to build it from scratch some time in the future. We can come back to use it as a guide when we do so

Sep 28 '22 14:09 DavisVaughan

vctrs vctrs copied to clipboard

Proof of concept) ALTREP compact integer reps

vctrs
vctrs copied to clipboard