xtensor-r reducer assignment issues

So, I've been debugging an issue all morning, and finally think I have a grasp of what's happening. It's not a fun one, so be warned.

The short version is: I think when you assign the result of xt::reduce() back to an xt::rarray, something goes wrong, but if you assign it back to xt::xarray then coerce that to xt::rarray everything is fine. This only seems to show up when reducing over large-ish (>2500 elements) matrices.

The long version I'm trying (and successfully!) to expose the "custom reducer" idea to the R user, so they can do something like rray_reducer(x, function(.x, .y) {.x + .y}) to get a reduced sum over a specified axis. It would be great because you can then use any R function as the reducer function (probably with some performance loss because you have to call back to R repeatedly from cpp, but im okay with that for now).

It works for small matrices (or maybe im getting lucky) but when I up the size to ~2500 elements or more, things fall apart.

The only way I can think to give you a reproducible example is to make a standalone cpp file that you can use Rcpp::sourceFile() on...so...I'm sorry about that.

This version goes to xarray first, then to rarray. It seems to work fine. In the rray_custom_reducer_impl function, with the comment // THIS IS THE PROBLEM LINE! is the section that is causing the issue. It works fine in this example because of the conversion first to xarray.

// [[Rcpp::depends(xtensorrr)]]
// [[Rcpp::plugins(cpp14)]]

#include <xtensor-r/rarray.hpp>
#include <xtensor/xreducer.hpp>
#include <xtensor/xio.hpp>
#include <xtensor/xarray.hpp>
#include <Rcpp.h>
using namespace Rcpp;

// How does this work?
// 1) rray_custom_reducer_cpp() takes x, a function with 2 args, the axes, and the return type
// 2) Based on the types of x and the return type, it calls something along the lines of:
//    rray_custom_reducer_impl<TYPEOFX, RETURNTYPE>(x, f, axes);
// 3) rray_custom_reducer_impl
//    - coerces x to an rarray
//    - creates a binary_functor using f and all of the type info
//    - then makes an xreducer_functor out of that and calls reduce using it
// 4) binary_functor this is where the magic happens
//    it lets you use scoping so you can initialize it with "things" you'd like
//    it to have available when the function is called (by the overloaded () operator)
//    - So we give it the function and the SEXPTYPE return type in the constructor
//    - And then we template on the Cpp element and return type
//    - Each call to the overloaded () operator calls f and coerces the result
//      to RET_T if it can be, otherwise error


// Functor allows a kind of scoping so r_functor(left, right)
// internally knows about the R function f
template <typename ELEM_T, typename RET_T>
class binary_functor {

private:
  Rcpp::Function f;
  SEXPTYPE RTYPE;
  R_xlen_t good_length = 1;

public:
  binary_functor(Rcpp::Function f_, SEXPTYPE RTYPE_) : f(f_), RTYPE(RTYPE_) {  }

  // Also pulling the first element of the result, even if it had more than
  // one thing. Should do checking instead.
  RET_T operator () (const ELEM_T& x, const ELEM_T& y) const {

    //std::cout << "x: " << x << std::endl;
    //std::cout << "y: " << y << std::endl;

    SEXP f_res = f(x, y);

    //Rcpp::print(f_res);

    // Type check
    if (TYPEOF(f_res) != RTYPE) {
      const char* good_type = Rf_type2char(RTYPE);
      const char* bad_type = Rf_type2char(TYPEOF(f_res));
      Rf_errorcall(
        R_NilValue,
        "`.f` should return length 1 objects of class `%s`, not class `%s`.",
        good_type,
        bad_type
      );
    }

    // Length check
    if (Rf_xlength(f_res) != good_length) {
      Rf_errorcall(
        R_NilValue,
        "`.f` should return objects of length 1, not %i.",
        Rf_xlength(f_res)
      );
    }

    // Clever conversion to replace explicit use of
    // REAL() or INTEGER()
    RET_T res = ((RET_T *)DATAPTR(f_res))[0];

    //std::cout << "res: " << res << std::endl;

    return(res);
  }
};


template <typename ELEM_R_T, typename RET_R_T>
SEXP rray_custom_reducer_impl(SEXP x, Rcpp::Function f, std::vector<std::size_t> axes) {

  // Underlying Cpp type. This helper is necessary because it converts
  // rlogical->int
  using elem_t = xt::r_detail::get_underlying_value_type_r<ELEM_R_T>;
  using ret_t  = xt::r_detail::get_underlying_value_type_r<RET_R_T>;

  // Pull the SEXPTYPE of the return value from the cpp type
  // The helper is necessary because rlogical->LGLSXP has been specially registered
  static constexpr int ret_sexptype = Rcpp::traits::r_sexptype_traits<RET_R_T>::rtype;

  // Create the rarray with the type matching x
  const xt::rarray<ELEM_R_T>& x_rray = xt::rarray<ELEM_R_T>(x);

  // Create the functor
  auto r_functor = binary_functor<typename elem_t::type, typename ret_t::type>(f, ret_sexptype);
  auto xr_functor = xt::make_xreducer_functor(r_functor);

  // THIS IS THE PROBLEM LINE!
  xt::xarray<typename ret_t::type> res = xt::reduce(xr_functor, x_rray, axes);
  xt::rarray<RET_R_T> res2 = res;

  std::cout << "xarray: " << res << std::endl;
  std::cout << "rarray: " << res2 << std::endl;

  return(res2);

}

// Helper for switching on the string op
constexpr unsigned int str2int(const char* str, int h = 0) {
  return !str[h] ? 5381 : (str2int(str, h+1) * 33) ^ str[h];
}

// [[Rcpp::export]]
SEXP rray_custom_reducer_cpp(SEXP x, Rcpp::Function f, std::vector<std::size_t> axes, SEXP type_) {

  // Collect the char from the string type ("double", "integer", "logical")
  const char* type = CHAR(Rf_asChar(type_));

  // Switch on X
  switch(TYPEOF(x)) {

    // This means ELEM_T = double
    case REALSXP: {

      // Switch on return type:
      // "double" = REALSXP
      // "integer" = INTSXP
      // "logical" = LGLSXP
      switch(str2int(type)) {

        case str2int("double"): {
          return rray_custom_reducer_impl<double, double>(x, f, axes);
        }

        case str2int("integer"): {
          return rray_custom_reducer_impl<double, int>(x, f, axes);
        }

        case str2int("logical"): {
          return rray_custom_reducer_impl<double, rlogical>(x, f, axes);
        }

        default: {
          Rcpp::stop("Incompatible SEXP encountered; only accepts doubles, integers, and logicals.");
        }

      }

    }

    // This means ELEM_T = int
  case INTSXP: {
    switch(str2int(type)) {

      case str2int("double"): {
        return rray_custom_reducer_impl<int, double>(x, f, axes);
      }

      case str2int("integer"): {
        return rray_custom_reducer_impl<int, int>(x, f, axes);
      }

      case str2int("logical"): {
        return rray_custom_reducer_impl<int, rlogical>(x, f, axes);
      }

      default: {
        Rcpp::stop("Incompatible SEXP encountered; only accepts doubles, integers, and logicals.");
      }

    }
  }

    // This means ELEM_T = int
  case LGLSXP: {
    switch(str2int(type)) {

      case str2int("double"): {
        return rray_custom_reducer_impl<rlogical, double>(x, f, axes);
      }

      case str2int("integer"): {
        return rray_custom_reducer_impl<rlogical, int>(x, f, axes);
      }

      case str2int("logical"): {
        return rray_custom_reducer_impl<rlogical, rlogical>(x, f, axes);
      }

      default: {
        Rcpp::stop("Incompatible SEXP encountered; only accepts doubles, integers, and logicals.");
      }

    }
  }

  default: {
    Rcpp::stop("Incompatible SEXP encountered; only accepts doubles, integers, and logicals.");
  }

  }
}

Correct results:

Rcpp::sourceCpp("~/Desktop/test.cpp")

x <- matrix(as.double(1:5000), ncol = 2)

fn <- function(.x, .y) {.x + .y}

rray_custom_reducer_cpp(x, fn, 0L, "double")
#> [1] 3126250 9376250

standard output and standard error

xarray: { 3126250.,  9376250.}
rarray: { 3126250.,  9376250.}

^{Created on 2018-12-09 by the reprex package (v0.2.1.9000)}

This is the not working version, even though I think it should work. It's more...unstable...than actually not working as the rarray result has variable output. The only thing I changed is that problem line, and now I just return 1 because I don't trust the output. But a key thing to look at here is the std output, where I've printed resulting rarray objects from the two calls to the same function. I had to try and run it multiple times because sometimes it just crashes...

// [[Rcpp::depends(xtensorrr)]]
// [[Rcpp::plugins(cpp14)]]

#include <xtensor-r/rarray.hpp>
#include <xtensor/xreducer.hpp>
#include <xtensor/xio.hpp>
#include <xtensor/xarray.hpp>
#include <Rcpp.h>
using namespace Rcpp;

// How does this work?
// 1) rray_custom_reducer_cpp() takes x, a function with 2 args, the axes, and the return type
// 2) Based on the types of x and the return type, it calls something along the lines of:
//    rray_custom_reducer_impl<TYPEOFX, RETURNTYPE>(x, f, axes);
// 3) rray_custom_reducer_impl
//    - coerces x to an rarray
//    - creates a binary_functor using f and all of the type info
//    - then makes an xreducer_functor out of that and calls reduce using it
// 4) binary_functor this is where the magic happens
//    it lets you use scoping so you can initialize it with "things" you'd like
//    it to have available when the function is called (by the overloaded () operator)
//    - So we give it the function and the SEXPTYPE return type in the constructor
//    - And then we template on the Cpp element and return type
//    - Each call to the overloaded () operator calls f and coerces the result
//      to RET_T if it can be, otherwise error


// Functor allows a kind of scoping so r_functor(left, right)
// internally knows about the R function f
template <typename ELEM_T, typename RET_T>
class binary_functor {

private:
  Rcpp::Function f;
  SEXPTYPE RTYPE;
  R_xlen_t good_length = 1;

public:
  binary_functor(Rcpp::Function f_, SEXPTYPE RTYPE_) : f(f_), RTYPE(RTYPE_) {  }

  // Also pulling the first element of the result, even if it had more than
  // one thing. Should do checking instead.
  RET_T operator () (const ELEM_T& x, const ELEM_T& y) const {

    //std::cout << "x: " << x << std::endl;
    //std::cout << "y: " << y << std::endl;

    SEXP f_res = f(x, y);

    //Rcpp::print(f_res);

    // Type check
    if (TYPEOF(f_res) != RTYPE) {
      const char* good_type = Rf_type2char(RTYPE);
      const char* bad_type = Rf_type2char(TYPEOF(f_res));
      Rf_errorcall(
        R_NilValue,
        "`.f` should return length 1 objects of class `%s`, not class `%s`.",
        good_type,
        bad_type
      );
    }

    // Length check
    if (Rf_xlength(f_res) != good_length) {
      Rf_errorcall(
        R_NilValue,
        "`.f` should return objects of length 1, not %i.",
        Rf_xlength(f_res)
      );
    }

    // Clever conversion to replace explicit use of
    // REAL() or INTEGER()
    RET_T res = ((RET_T *)DATAPTR(f_res))[0];

    //std::cout << "res: " << res << std::endl;

    return(res);
  }
};


template <typename ELEM_R_T, typename RET_R_T>
SEXP rray_custom_reducer_impl(SEXP x, Rcpp::Function f, std::vector<std::size_t> axes) {

  // Underlying Cpp type. This helper is necessary because it converts
  // rlogical->int
  using elem_t = xt::r_detail::get_underlying_value_type_r<ELEM_R_T>;
  using ret_t  = xt::r_detail::get_underlying_value_type_r<RET_R_T>;

  // Pull the SEXPTYPE of the return value from the cpp type
  // The helper is necessary because rlogical->LGLSXP has been specially registered
  static constexpr int ret_sexptype = Rcpp::traits::r_sexptype_traits<RET_R_T>::rtype;

  // Create the rarray with the type matching x
  const xt::rarray<ELEM_R_T>& x_rray = xt::rarray<ELEM_R_T>(x);

  // Create the functor
  auto r_functor = binary_functor<typename elem_t::type, typename ret_t::type>(f, ret_sexptype);
  auto xr_functor = xt::make_xreducer_functor(r_functor);

  // THIS IS THE PROBLEM LINE!
  xt::rarray<RET_R_T> res = xt::reduce(xr_functor, x_rray, axes);

  std::cout << "rarray: " << res << std::endl;

  return(Rf_ScalarInteger(1));

}

// Helper for switching on the string op
constexpr unsigned int str2int(const char* str, int h = 0) {
  return !str[h] ? 5381 : (str2int(str, h+1) * 33) ^ str[h];
}

// [[Rcpp::export]]
SEXP rray_custom_reducer_cpp2(SEXP x, Rcpp::Function f, std::vector<std::size_t> axes, SEXP type_) {

  // Collect the char from the string type ("double", "integer", "logical")
  const char* type = CHAR(Rf_asChar(type_));

  // Switch on X
  switch(TYPEOF(x)) {

  // This means ELEM_T = double
  case REALSXP: {

    // Switch on return type:
    // "double" = REALSXP
    // "integer" = INTSXP
    // "logical" = LGLSXP
    switch(str2int(type)) {

  case str2int("double"): {
    return rray_custom_reducer_impl<double, double>(x, f, axes);
  }

  case str2int("integer"): {
    return rray_custom_reducer_impl<double, int>(x, f, axes);
  }

  case str2int("logical"): {
    return rray_custom_reducer_impl<double, rlogical>(x, f, axes);
  }

  default: {
    Rcpp::stop("Incompatible SEXP encountered; only accepts doubles, integers, and logicals.");
  }

  }

  }

    // This means ELEM_T = int
  case INTSXP: {
    switch(str2int(type)) {

  case str2int("double"): {
    return rray_custom_reducer_impl<int, double>(x, f, axes);
  }

  case str2int("integer"): {
    return rray_custom_reducer_impl<int, int>(x, f, axes);
  }

  case str2int("logical"): {
    return rray_custom_reducer_impl<int, rlogical>(x, f, axes);
  }

  default: {
    Rcpp::stop("Incompatible SEXP encountered; only accepts doubles, integers, and logicals.");
  }

  }
  }

    // This means ELEM_T = int
  case LGLSXP: {
    switch(str2int(type)) {

  case str2int("double"): {
    return rray_custom_reducer_impl<rlogical, double>(x, f, axes);
  }

  case str2int("integer"): {
    return rray_custom_reducer_impl<rlogical, int>(x, f, axes);
  }

  case str2int("logical"): {
    return rray_custom_reducer_impl<rlogical, rlogical>(x, f, axes);
  }

  default: {
    Rcpp::stop("Incompatible SEXP encountered; only accepts doubles, integers, and logicals.");
  }

  }
  }

  default: {
    Rcpp::stop("Incompatible SEXP encountered; only accepts doubles, integers, and logicals.");
  }

  }
}

Sometimes this will crash on you, sometimes it will just give odd results...

Rcpp::sourceCpp("~/Desktop/test2.cpp")

x <- matrix(as.double(1:5000), ncol = 2)

fn <- function(.x, .y) {.x + .y}

rray_custom_reducer_cpp2(x, fn, 0L, "double")
#> [1] 1

rray_custom_reducer_cpp2(x, fn, 0L, "double")
#> [1] 1

standard output and standard error

rarray: { 3126250.}
rarray: { 6.938981e-310}

^{Created on 2018-12-09 by the reprex package (v0.2.1.9000)}

Dec 09 '18 18:12 DavisVaughan

I am looking into this... seems an awkward bug! But I can reproduce it.

Dec 10 '18 14:12 wolfv

I dove a bit more into it and I did not have the same problems when using xt::sum(...). So I am wondering if the segfault is related to calling the R-reducing function? I tried to add a PROTECT statement around f(x, y) but that didn't work for now. However, there might be something along those lines going on? Are we sure that there are no memory/GC issues inside the binary_functor?

I also see Shield<SEXP> my_result = something_from_r; used a lot in Rcpp code.

Dec 10 '18 14:12 wolfv

Hey, I think I found it. The way we're assigning from xexpressions is currently broken. Will fix ASAP. Sorry for these painful bugs and thanks for the awesome bug reports!

Dec 10 '18 15:12 wolfv

I think I did try Shield() and some PROTECT things as well and didn't have any luck.

Hopefully you can fix it with the xexpression assignment! Not a problem about it being buggy, I'm more than happy to hunt these down for you.

Dec 10 '18 15:12 DavisVaughan

I've investigated this quite a bit now and am still not sure what's going on exactly.

With pure-C++ functors it seems to work fine. I also call gc from Rcpp using

Rcpp::Function("gc")();

which noticeably slows the function call but doesn't lead to a segfault.

Things like rarray<double> s = xt::sum(input, axes) never segfaulted for me.

Maybe we can simplify this even further and ask on the Rcpp mailing list for help?

Btw calling back to R in the reducer will be a lot slower than pure C++ :)

Dec 11 '18 13:12 wolfv

I'll see if I can simplify further.

I'm aware of the painful speed loss from calling back to R from cpp, but think that the flexibility might be nice for small scale things, even if its slower.

Dec 11 '18 13:12 DavisVaughan

maybe we can make an example for only Numeric / double stuff and see if the guys over at Rcpp have ideas.

Dec 11 '18 14:12 wolfv

You mean like this?

// [[Rcpp::depends(xtensorrr)]]
// [[Rcpp::plugins(cpp14)]]

#include <xtensor-r/rarray.hpp>
#include <xtensor/xreducer.hpp>
#include <xtensor/xio.hpp>
#include <xtensor/xarray.hpp>
#include <Rcpp.h>
using namespace Rcpp;

// Assume we take in doubles and return doubles

// [[Rcpp::export]]
SEXP custom_reducer_impl(SEXP x, Rcpp::Function f, std::vector<std::size_t> axes) {

  // Create an rarray of doubles
  const xt::rarray<double>& x_rray = xt::rarray<double>(x);
  
  // This lambda calls out to the R function at each iteration of the reduce call
  // The return value of f is a SEXP, we assume it holds a double under the hood
  auto r_functor_lambda = [f](const auto& left, const auto& right) {
    SEXP f_res = f(left, right);
    double res = REAL(f_res)[0];
    return(res);
  };
  
  // Make an xtensor xreducer from our lambda
  auto xr_functor = xt::make_xreducer_functor(r_functor_lambda);
  
  // THIS IS THE PROBLEM LINE!
  xt::xarray<double> res = xt::reduce(xr_functor, x_rray, axes);
  xt::rarray<double> res2 = res;
  
  std::cout << "xarray: " << res << std::endl;
  std::cout << "rarray: " << res2 << std::endl;
  
  return(res2);
}

// [[Rcpp::export]]
SEXP custom_reducer_impl_bad(SEXP x, Rcpp::Function f, std::vector<std::size_t> axes) {
  
  // Create an rarray of doubles
  const xt::rarray<double>& x_rray = xt::rarray<double>(x);
  
  // This lambda calls out to the R function at each iteration of the reduce call
  // The return value of f is a SEXP, we assume it holds a double under the hood
  auto r_functor_lambda = [f](const auto& left, const auto& right) {
    SEXP f_res = f(left, right);
    double res = REAL(f_res)[0];
    return(res);
  };
  
  // Make an xtensor xreducer from our lambda
  auto xr_functor = xt::make_xreducer_functor(r_functor_lambda);
  
  // THIS IS THE PROBLEM LINE!
  xt::rarray<double> res = xt::reduce(xr_functor, x_rray, axes);
  
  std::cout << "rarray: " << res << std::endl;
  
  return(res);
}

# change to your own path.
Rcpp::sourceCpp("~/Desktop/simple-example.cpp")

small <- matrix(as.double(1:6), ncol = 2)
large <- matrix(as.double(1:5000), ncol = 2)

fn <- function(.x, .y) {.x + .y}

custom_reducer_impl(small, fn, 0L)
#> [1]  6 15
custom_reducer_impl_bad(small, fn, 0L)
#> [1]  6 15

custom_reducer_impl(large, fn, 0L)
#> [1] 3126250 9376250

# should crash
# custom_reducer_impl_bad(large, fn, 0L)

Update) Updated the comments to be more relevant

Dec 11 '18 14:12 DavisVaughan

I've used a lambda instead of the separate functor idea, but I think the same idea is still holding

Dec 11 '18 14:12 DavisVaughan

Cool! I'll get back to work on this tomorrow.

Btw. there is also the immediate_evaluation mode, which is not lazy, but faster (and assigns to a intermediate xarray). It could be interesting for you!

Dec 11 '18 16:12 wolfv

There was one other idea I wanted to try. By-passing Rcpp::Function and using the bare bones C interface way to call the R function using Rf_eval(). The purrr package uses this approach, so I followed their template, specifically map().

I noticed a few things here:

It is much faster! See the benchmark at the very end. 3 in the benchmark is the current way, 4 is the bare bones way. Still slower than base R, but close enough that the gains in flexibility may make this worth it.
It it still a little unstable when assigning right to an rarray<double> rather than going to xarray then rarray.

Regarding the instability, you do get different results than before when we used Rcpp::Function. In this case, when assigning right to rarray<double>, you generally get the correct result for matrices with 5000 elements, but using 50000 elements becomes unstable. It sometimes gives the right result, and other times gives an answer with strange "promise" attributes and an incorrect dim attribute. (promises in R are, as far as I know, like unevaluated expressions). Eventually, running that version enough times will crash R.

Below, the cpp function custom_reducer_bare_call() holds the implementation of the "bare bones" version. It currently uses the working version, assigning first to xarray then to rarray. Under // THIS IS THE PROBLEM LINE! if you switch out the assignment for the commented out code, you can try it with the direct assignment to an rarray object.

These results seem to tell me either:

a) There is still something wrong with the way xexpressions are assigned to rarray objects b) Something needs to "force" the promised elements when going directly to rarray objects. I'm not sure if that is something I need to do, of if xtensor should do it. I also don't know how they get forced when going to xarray first.

// [[Rcpp::depends(xtensorrr)]]
// [[Rcpp::plugins(cpp14)]]

#include <xtensor-r/rarray.hpp>
#include <xtensor/xreducer.hpp>
#include <xtensor/xio.hpp>
#include <xtensor/xarray.hpp>
#include <Rcpp.h>
using namespace Rcpp;

// Assume we take in doubles and return doubles

// [[Rcpp::export]]
SEXP custom_reducer_impl(SEXP x, Rcpp::Function f, std::vector<std::size_t> axes) {
  
  // Create an rarray of doubles
  const xt::rarray<double>& x_rray = xt::rarray<double>(x);
  
  // This lambda calls out to the R function at each iteration of the reduce call
  // The return value of f is a SEXP, we assume it holds a double under the hood
  auto r_functor_lambda = [f](const auto& left, const auto& right) {
    SEXP f_res = f(left, right);
    double res = REAL(f_res)[0];
    return(res);
  };
  
  // Make an xtensor xreducer from our lambda
  auto xr_functor = xt::make_xreducer_functor(r_functor_lambda);
  
  // THIS IS THE PROBLEM LINE!
  xt::xarray<double> res = xt::reduce(xr_functor, x_rray, axes);
  xt::rarray<double> res2 = res;
  
  return(res2);
}

// [[Rcpp::export]]
SEXP custom_reducer_impl_bad(SEXP x, Rcpp::Function f, std::vector<std::size_t> axes) {
  
  // Create an rarray of doubles
  const xt::rarray<double>& x_rray = xt::rarray<double>(x);
  
  // This lambda calls out to the R function at each iteration of the reduce call
  // The return value of f is a SEXP, we assume it holds a double under the hood
  auto r_functor_lambda = [f](const auto& left, const auto& right) {
    SEXP f_res = f(left, right);
    double res = REAL(f_res)[0];
    return(res);
  };
  
  // Make an xtensor xreducer from our lambda
  auto xr_functor = xt::make_xreducer_functor(r_functor_lambda);
  
  // THIS IS THE PROBLEM LINE!
  xt::rarray<double> res = xt::reduce(xr_functor, x_rray, axes);
  
  return(res);
}

// [[Rcpp::export]]
SEXP custom_reducer_bare_call(SEXP x, SEXP f_name_, std::vector<std::size_t> axes, SEXP env) {
  
  // get the name of the function object in the env
  // install it as f
  const char* f_name = CHAR(Rf_asChar(f_name_));
  SEXP f = Rf_install(f_name);
  
  // We also need left_ and right_ installed
  // to shove them in the call before the reduce loop
  SEXP left_ = Rf_install("left_");
  SEXP right_  = Rf_install("right_");
  
  // Create the call like f(left_, right_)
  // Ignoring dots for now
  SEXP f_call = PROTECT(Rf_lang3(f, left_, right_));
  
  // Create an rarray of doubles
  const xt::rarray<double>& x_rray = xt::rarray<double>(x);
  
  // This lambda calls out to the R function at each iteration of the reduce call
  // The return value of f is a SEXP, we assume it holds a double under the hood
  auto r_functor_lambda = [f_call, env](const auto& left, const auto& right) {
    
    // left and right should be length 1
    // Assume they are REAL for now
    SEXP left_val  = PROTECT(Rf_allocVector(REALSXP, 1));
    SEXP right_val = PROTECT(Rf_allocVector(REALSXP, 1));
    
    // Shove in the actual values
    REAL(left_val)[0] = left;
    REAL(right_val)[0] = right;
    
    // Define left_ again (for the existing call to find)
    // And assign its value to be the SEXP left_val
    SEXP left_ = Rf_install("left_");
    Rf_defineVar(left_, left_val, env);
    
    // Same for right_
    SEXP right_ = Rf_install("right_");
    Rf_defineVar(right_, right_val, env);
    
    // Eval the call in the right env
    SEXP f_res = PROTECT(Rf_eval(f_call, env));
    //SEXP f_res = PROTECT(R_forceAndCall(f_call, 2, env));
    
    // Pull out the result
    double res = REAL(f_res)[0];

    UNPROTECT(3);
    
    return(res);
  };
  
  // Make an xtensor xreducer from our lambda
  auto xr_functor = xt::make_xreducer_functor(r_functor_lambda);
  
  // THIS IS THE PROBLEM LINE!
  xt::xarray<double> res2 = xt::reduce(xr_functor, x_rray, axes);
  xt::rarray<double> res = res2;
  //xt::rarray<double> res = xt::reduce(xr_functor, x_rray, axes);
  
  UNPROTECT(1);
  
  return(res);
}

# change to your own path.
Rcpp::sourceCpp("~/Desktop/simple-example.cpp")

small <- matrix(as.double(1:6), ncol = 2)
large <- matrix(as.double(1:5000), ncol = 2)
reallarge <- matrix(as.double(1:50000), ncol = 2)

fn <- function(.x, .y) {.x + .y}
fn2 <- function(.x, .y) {.x}

# //////////////////////////////////////////////////////////////////////////////
# sum-reduce

custom_reducer_impl(small, fn, 0L)
#> [1]  6 15
custom_reducer_impl_bad(small, fn, 0L) # somehow this works
#> [1]  6 15

custom_reducer_impl(large, fn, 0L)
#> [1] 3126250 9376250
# should crash
# custom_reducer_impl_bad(large, fn, 0L)

# //////////////////////////////////////////////////////////////////////////////
# reduce but just return x back

# All good here
custom_reducer_impl(small, fn2, 0L)
#> [1] 1 4
custom_reducer_impl_bad(small, fn2, 0L)
#> [1] 1 4

custom_reducer_impl(large, fn2, 0L)
#> [1]    1 2501
# Crashes
# custom_reducer_impl_bad(large, fn2, 0L)

# //////////////////////////////////////////////////////////////////////////////
# Bare bones version

custom_reducer_impl_bare <- function(.x, .f, ...) {
  .f <- rlang::as_function(.f)
  custom_reducer_bare_call(.x, ".f", 0L, env = environment())
}

custom_reducer_impl_bare(small, fn)
#> [1]  6 15

# its alive!
custom_reducer_impl_bare(large, fn)
#> [1] 3126250 9376250

# this works too!
custom_reducer_impl_bare(reallarge, fn)
#> [1] 312512500 937512500

# //////////////////////////////////////////////////////////////////////////////
# Benchmarks

# Really large
bench::mark(
  as.array(colSums(reallarge)),
  rray:::rray_reducer_cpp("sum", reallarge, 0L),
  custom_reducer_impl(reallarge, fn, 0L), # <- functor + Rcpp::Function
  custom_reducer_impl_bare(reallarge, fn) # <- bare bones
)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 4 x 10
#>   expression     min    mean  median      max `itr/sec` mem_alloc  n_gc
#>   <chr>      <bch:t> <bch:t> <bch:t> <bch:tm>     <dbl> <bch:byt> <dbl>
#> 1 as.array(…  37.6µs  40.4µs    39µs   1.48ms  24734.      7.39KB     2
#> 2 "rray:::r…  98.1µs 276.9µs 255.7µs    9.9ms   3612.       1.5MB    13
#> 3 custom_re…   472ms 479.2ms 479.2ms 486.46ms      2.09    2.49KB    37
#> 4 custom_re…  18.4ms  22.1ms  20.4ms  55.75ms     45.2     2.49KB    23
#> # … with 2 more variables: n_itr <int>, total_time <bch:tm>

^{Created on 2018-12-11 by the reprex package (v0.2.1.9000)}

Dec 11 '18 16:12 DavisVaughan

immediate_evaluation mode

I'll check that out! Thanks!

Dec 11 '18 16:12 DavisVaughan

Debugging this a little more just to try and narrow in on the issue. Here are a few simpler variations:

custom_reducer_impl_simple - The functor returns 1, assign straight to rarray<double>
custom_reducer_impl_simple_error - The functor returns 1, a useless NumericVector is created in the functor, assign straight to rarray<double>
custom_reducer_impl_simple_works - Same as 2, but assign to xarray then to rarray.

2 fails randomly on large matrices when you try and print the result out, and I have no idea why. I'm generally getting the error message 'getCharCE' must be called on a CHARSXP, but I also sometimes get results that don't have the right dim attribute or are just the wrong class (sometimes I get character or logical vectors back for some reason).

The interesting thing here is that this happens even though I am not accessing left or right at all. Just calling NumericVector inside the functor is enough to break things, apparently.

// [[Rcpp::depends(xtensorrr)]]
// [[Rcpp::plugins(cpp14)]]

#include <xtensor-r/rarray.hpp>
#include <xtensor/xreducer.hpp>
#include <xtensor/xio.hpp>
#include <xtensor/xarray.hpp>
#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
SEXP custom_reducer_impl_simple(SEXP x, std::vector<std::size_t> axes) {

  const xt::rarray<double>& x_rray = xt::rarray<double>(x);

  // Just return the double value of 1
  auto r_functor_lambda = [](const auto& left, const auto& right) {
    double res = 1;
    return(res);
  };

  auto xr_functor = xt::make_xreducer_functor(r_functor_lambda);

  xt::rarray<double> res = xt::reduce(xr_functor, x_rray, axes);

  return(res);
}

// [[Rcpp::export]]
SEXP custom_reducer_impl_simple_error(SEXP x, std::vector<std::size_t> axes) {

  const xt::rarray<double>& x_rray = xt::rarray<double>(x);

  // Create a NumericVector in there
  auto r_functor_lambda = [](const auto& left, const auto& right) {
    NumericVector l(1, 1.0);
    double res = 1;
    return(res);
  };

  auto xr_functor = xt::make_xreducer_functor(r_functor_lambda);

  xt::rarray<double> res = xt::reduce(xr_functor, x_rray, axes);

  return(res);
}

// [[Rcpp::export]]
SEXP custom_reducer_impl_simple_works(SEXP x, std::vector<std::size_t> axes) {

  const xt::rarray<double>& x_rray = xt::rarray<double>(x);

  // Create a NumericVector in there
  auto r_functor_lambda = [](const auto& left, const auto& right) {
    NumericVector l(1, 1.0);
    double res = 1;
    return(res);
  };

  auto xr_functor = xt::make_xreducer_functor(r_functor_lambda);

  xt::xarray<double> res2 = xt::reduce(xr_functor, x_rray, axes);
  xt::rarray<double> res = res2;

  return(res);
}

# change to your own path.
Rcpp::sourceCpp("~/Desktop/example.cpp")

real_large <- matrix(as.double(1:50000), ncol = 2)

# Fine
for(i in 1:10000) {
  x <- custom_reducer_impl_simple(real_large, 0L)
}

# This should error if you try it
# If you comment out the print(x) line, it probably wont error
for(i in 1:10000) {
  x <- custom_reducer_impl_simple_error(real_large, 0L)
  print(x)
}

# Fine
for(i in 1:10000) {
  x <- custom_reducer_impl_simple_works(real_large, 0L)
  print(x)
}

Dec 19 '18 18:12 DavisVaughan

Hi @DavisVaughan

just FYI I just found a bug in the reducers for column_major (which is always the case in R). Maybe this bug will also fix our issues here! :)

Dec 21 '18 09:12 wolfv

ah, forget what I just said, it's just a bug with the new keep_dim feature :/

Dec 21 '18 09:12 wolfv

Should this issue be closed?

Apr 07 '19 17:04 SylvainCorlay

I'm still having the error, but custom reducers are low on my priority list right now, and probably won't make it into the first version of rray.

That said, if you throw out the rest of the issue and focus on this comment: https://github.com/QuantStack/xtensor-r/issues/76#issuecomment-448689755, then that is the simplest way to showcase the error.

Apr 07 '19 18:04 DavisVaughan

Hey @DavisVaughan FYI, we have worked quite a bit on Xtensor reducers recently (thanks to the hard work of @AdrienDelsalle) in core Xtensor, and that issue should now be fixed!

Mar 02 '21 16:03 SylvainCorlay

xtensor-r xtensor-r copied to clipboard

reducer assignment issues

xtensor-r
xtensor-r copied to clipboard