decontam
decontam copied to clipboard
ENH: thoughts on supporting sparse data?
Very excited to try out this package. I've put together a barebones qiime2 plugin to give this package a whirl some of our larger datasets: https://github.com/mortonjt/q2-decontam
From what I understand from the package, isContaminant()
takes a dense table of counts. However, this quickly blows up when dealing with tables with more than 10k samples. Would there be any interest in supporting sparse matrix inputs in the future?
Sparse matrix support hasn't been requested before. There is good support for sparse matrix representations though the Matrix
package in R, so possibly this would be straightforward to implement within R. But for it to work within a Q2 plugin, the data transfer from the Q2 files into R would need to be sparse-aware as well.
Tagging as enhancement.
👍
Regarding the data transfer, the most trivial way around this is to pass the matrices through a sparse coordinate format - it'll just boil down to a data frame of 3 columns (row, column, count). And it looks like Matrix
supports different sparse matrix representations, so it should be ok to hop between different formats.