Advanced-R-Solutions icon indicating copy to clipboard operation
Advanced-R-Solutions copied to clipboard

Chapter 20: Outputs from C++ and R versions of functions are not the same

Open IndrajeetPatil opened this issue 3 years ago • 4 comments

For example, in section 20.3 Q6, the following code creates a version of union() in C++:

#include <Rcpp.h>
#include <unordered_set>
#include <algorithm>
using namespace Rcpp;

// [[Rcpp::plugins(cpp11)]]
// [[Rcpp::export]]
IntegerVector unionC(IntegerVector x, IntegerVector y) {
  int nx = x.size();
  int ny = y.size();
  
  IntegerVector tmp(nx + ny);
  
  std::sort(x.begin(), x.end()); // unique
  std::sort(y.begin(), y.end());
  
  IntegerVector::iterator out_end = std::set_union(
    x.begin(), x.end(), y.begin(), y.end(), tmp.begin()
  );
  
  int prev_value = 0;
  IntegerVector out;
  for (IntegerVector::iterator it = tmp.begin();
       it != out_end; ++it) {
    if ((it != tmp.begin())  && (prev_value == *it)) continue;
    
    out.push_back(*it);
    
    prev_value = *it;
  }
  
  return out;
}

But it doesn't produce the same output as its R equivalent:

# input vectors include duplicates
x <- c(1, 4, 5, 5, 5, 6, 2)
y <- c(4, 1, 6, 8)

union(x, y)
#> [1] 1 4 5 6 2 8

unionC(x, y)
#> [1] 1 2 4 5 6 8

IndrajeetPatil avatar May 18 '22 19:05 IndrajeetPatil

Here is another example from the same section Q3:

// As a one-liner
// [[Rcpp::export]]
std::unordered_set<double> uniqueCC(NumericVector x) {
  return std::unordered_set<double>(x.begin(), x.end());
}

The outputs are different:

v1 <- c(1, 3, 3, 6, 7, 8, 9)

unique(v1)
#> [1] 1 3 6 7 8 9

uniqueCC(v1)
#> [1] 9 8 1 7 3 6

IndrajeetPatil avatar May 18 '22 19:05 IndrajeetPatil

Do you have an example with a differing result, eg a differing set? It seems that just the order is different.

(That said, I see that it can make a huge difference in code as I rely on the order of unique (ie first appearance) quite often in real world code)

Tazinho avatar May 18 '22 21:05 Tazinho

No, but I thought the exercises expect these results to be the same (they are indeed the same for all other Rcpp chapter exercises, except the ones with STL).

Otherwise, performance of R and C++ functions can't be compared, since they are producing different outputs.

IndrajeetPatil avatar May 18 '22 21:05 IndrajeetPatil

I agree. Thanks for the additional thoughts. Ist makes sense to fix this some time or at least leave a note in the answer. Also edited my comment above.

Tazinho avatar May 18 '22 22:05 Tazinho