bigmemory icon indicating copy to clipboard operation
bigmemory copied to clipboard

timeline for big.sparse.matrix support?

Open YaohuiZeng opened this issue 8 years ago • 9 comments

Thanks for this great package. I have my package that depends on bigmemory, and now I wish to add the support for sparse matrix to my package. I noticed that you put big.sparse.matrix on your wish list. Just wondering whether you have any plan to implement and any timeline expected for that?

Of course I can use sparse matrix from Eigen or Armadillo libraries. But those don't support memory mapping, which is the key feature I need.

YaohuiZeng avatar Jan 17 '17 22:01 YaohuiZeng

I think the two main active developers are overbooked at the moment.

I think the difficulty of this implementation really depends on what you really need. If you only need a C++ accessor (i, j), I think it shoud be relatively easy to do it via defining another class. If you are interested, we could try it in another package and then merge it later if it works.

privefl avatar Jan 18 '17 09:01 privefl

@privefl I haven't thought about the design though, but basically I need a memory-mapped version of SpMat in Armadillo, for example. If that's what in your mind, I'd be very glad to work together with you to get this implemented, though I am not sure how much work it requires.

YaohuiZeng avatar Jan 18 '17 15:01 YaohuiZeng

Hi @YaohuiZeng, as @privefl said Mike and I quite busy. I would love to give more attention to this package but I am quite busy with other work related matters as well as maintaining my other packages. If you are interested in contributing we have begun transitioning towards having development happening on my fork of the package. Although I am not free enough to write much code I am able to monitor merge requests and respond to questions.

I'm not sure of your familiarity with C++ so not sure how deep you want to get in to this. Basically what I had in mind is to ideally have child class inherit from the parent BigMatrix classes but I don't know if this will work that simply. I suspect it won't and it will require another distinct class such as SparsesBigMatrix mirroring the structure in the file here. In either case, we will want to look in to the boost library for sparse support/functionality. That is ultimately where the heart of this package is rooted. If you can find the support within boost and get an idea of how it works we will have a very solid starting point.

cdeterman avatar Jan 18 '17 16:01 cdeterman

I was thinking more about a naive implementation:

  • create an R class with i, p, Dim, x slots (just like a dgCMatrix) with i and x being one-column big.matrix objects.
  • then create a customize accessor (i, j) for this class in C++.

Maybe too naive.

privefl avatar Jan 18 '17 17:01 privefl

@privefl That may lead to some performance gains with only customized functions but won't provide any actual compression (i.e. save memory footprint) or an ability to interface nicely with other libraries like Armadillo (unsure how converting from a normal matrix to compressed would work). We can experiment but I personally still think exploring more use of boost is likely the way to go. Of course, the other authors are welcome to add their opinions as well @kaneplusplus @phaverty

cdeterman avatar Jan 18 '17 17:01 cdeterman

@privefl, I think your design is more like another implementation of SpMat in Armadillo, i.e., no memory-mapping involved, is there? My best guess is that we may still have to go with boost, just as @cdeterman said.

YaohuiZeng avatar Jan 18 '17 17:01 YaohuiZeng

Okay, I was pointing more to a light implementation, which could only do a restricted number of features. I don't think @YaohuiZeng need all the features available for a standard big.matrix.

What I had in mind was https://github.com/privefl/spBigMatrix. There, you can see the results

  • for converting from a dgCMatrix,
  • accessing the matrix in C++
  • and computing a cross-product with a vector.

privefl avatar Jan 19 '17 10:01 privefl

@privefl A light implementation like this could really go a long way! Like @YaohuiZeng, I'd be quite interested in this support for bigmemory

jaredhuling avatar Mar 16 '17 01:03 jaredhuling

I've restarted my project of having an on-disk sparse matrix format, this has now become https://cran.r-project.org/web/packages/bigsparser/index.html

This is still a very light implementation but there are already some useful features that I've started using in my work.

privefl avatar Oct 28 '20 17:10 privefl