bigmemory icon indicating copy to clipboard operation
bigmemory copied to clipboard

How to use omp in Rcpp to speed up changes to the value of big.matrix?

Open Sherry520 opened this issue 3 months ago • 2 comments

I am trying to compute the Hadamard product of all column combinations of the two matrices (epi1,epi2) to form a new matrix that far exceeds the maximum number of columns supported by R and takes up a lot of memory. So I decided to use big.memory to store this matrix on disk (I have not yet tested whether big.matrix supports matrices with more than 2^32-1 columns). However, this calculation process is time-consuming if the for loop is used, and foreach in R and dopar cannot operate on big.matrix, so I decided to use Rcpp interface, use C++ language for calculation, and use openmpi to speed up the calculation process. I'm really not familiar with C++, and I use AI tools to help me program, but it doesn't seem believable.

Problem I encountered:

  • [1]. How to assign the final calculated variable result to big.matrix?
  • [2]. Whether return epi_data is required here, or whether it will automatically modify the external big.matrix?

R code:

epi_data <- filebacked.big.matrix(
  nrow = nrow(epi1),
  ncol = ncol(combos),
  type = "double", # char是c++的单个字符
  backingfile = "epi_data.bin", 
  backingpath = dirname("epi_datai"), 
  descriptorfile = "epi_data.des",
  dimnames = c(NULL, NULL)
)

R code:(This is the algorithm I'm going to implement with C++)

epi_data <- foreach(x=iter(combos, by='col'),.combine = "cbind",.inorder = TRUE) %dopar% {
  TRAN %*% (epi1[, x[1]] * epi2[, x[2]])
}

C++ code:

// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::depends(bigmemory)]]

#include <RcppArmadillo.h>
#include <bigmemory/MatrixAccessor.hpp>
#include <omp.h>


using namespace std;
using namespace Rcpp;

// epi_data: big.matrix to save result
// combos: all combn of epi1 and epi2

// [[Rcpp::export]]
Rcpp::XPtr<BigMatrix> ca_epi_data(Rcpp::XPtr<BigMatrix> epi_data,
                                  Rcpp::XPtr<BigMatrix> combos,
                                  arma::mat epi1,
                                  arma::mat epi2,
                                  arma::mat TRAN) {
  omp_set_num_threads(4);
  
  typedef double T;
  MatrixAccessor<T> epi_col = MatrixAccessor<T>(*epi_data);
  MatrixAccessor<T> combos_col = MatrixAccessor<T>(*combos);
  
  int m = combos->nrow();
  
  #pragma omp parallel for
  for(int i=0; i < m;i++) {
    arma::colvec row_epi1 = epi1.row(i);
    arma::colvec row_epi2 = epi2.row(i);
      
      // calculate hadamard product
      arma::colvec hadamard_product = row_epi1 % row_epi2;
      
      // matrix *
      arma::colvec result = TRAN * hadamard_product;
      
      // apply the result to big.matrix
      epi_col[i] = result;
    }
  }
  return epi_data;
}

Sherry520 avatar Nov 20 '24 12:11 Sherry520