decontam icon indicating copy to clipboard operation
decontam copied to clipboard

ENH: thoughts on supporting sparse data?

Open mortonjt opened this issue 3 years ago • 2 comments

Very excited to try out this package. I've put together a barebones qiime2 plugin to give this package a whirl some of our larger datasets: https://github.com/mortonjt/q2-decontam

From what I understand from the package, isContaminant() takes a dense table of counts. However, this quickly blows up when dealing with tables with more than 10k samples. Would there be any interest in supporting sparse matrix inputs in the future?

mortonjt avatar Jun 01 '21 18:06 mortonjt

Sparse matrix support hasn't been requested before. There is good support for sparse matrix representations though the Matrix package in R, so possibly this would be straightforward to implement within R. But for it to work within a Q2 plugin, the data transfer from the Q2 files into R would need to be sparse-aware as well.

Tagging as enhancement.

benjjneb avatar Jun 03 '21 22:06 benjjneb

👍
Regarding the data transfer, the most trivial way around this is to pass the matrices through a sparse coordinate format - it'll just boil down to a data frame of 3 columns (row, column, count). And it looks like Matrix supports different sparse matrix representations, so it should be ok to hop between different formats.

mortonjt avatar Jun 03 '21 23:06 mortonjt