proxyC icon indicating copy to clipboard operation
proxyC copied to clipboard

Dev mask update

Open koheiw opened this issue 5 months ago • 0 comments

Added maskUpdate() to create complex patterns by combining matrices. The below operations are equivalent to logical operations on matrices

 mak2 <- msk | mask(colnames(mt1), colnames(mt3))

but creating multiple pattern matrices takes a lot of space in RAM. Some of the matrices can be dense and very large too. I am happy to change the names of the function and arguments if you have better ideas.

> mt1 <- Matrix::rsparsematrix(100, 6, 1.0)
> colnames(mt1) <- c("a", "a", "d", "d", "e", "e")
> mt2 <- Matrix::rsparsematrix(100, 5, 1.0)
> colnames(mt2) <- c("a", "b", "c", "d", "e")
> mt3 <- Matrix::rsparsematrix(100, 5, 1.0)
> colnames(mt3) <- c("e", "e", "e", "e", "e")
> 
> # create a pattern matrix
> (msk <- mask(colnames(mt1), colnames(mt2)))
6 x 5 sparse Matrix of class "lgTMatrix"
  a b c d e
a | . . . .
a | . . . .
d . . . | .
d . . . | .
e . . . . |
e . . . . |
> simil(mt1, mt2, margin = 2, mask = msk, drop0 = TRUE)
6 x 5 sparse Matrix of class "dgTMatrix"
            a b c           d            e
a  0.03070049 . .  .           .          
a -0.03691416 . .  .           .          
d  .          . . -0.02489279  .          
d  .          . .  0.03654560  .          
e  .          . .  .          -0.102052365
e  .          . .  .           0.003882867
> 
> # update a pattern matrix
> (msk2 <- maskUpdate(msk, colnames(mt1), colnames(mt3), operator = "or"))
6 x 5 sparse Matrix of class "lgTMatrix"
  a b c d e
a | . . . .
a | . . . .
d . . . | .
d . . . | .
e | | | | |
e | | | | |
> simil(mt1, mt2, margin = 2, mask = msk2, drop0 = TRUE)
6 x 5 sparse Matrix of class "dgTMatrix"
            a          b          c           d            e
a  0.03070049 .           .          .           .          
a -0.03691416 .           .          .           .          
d  .          .           .         -0.02489279  .          
d  .          .           .          0.03654560  .          
e  0.03039425 0.10977245  0.1437414 -0.16394760 -0.102052365
e -0.09407865 0.05965081 -0.0605547 -0.15769245  0.003882867

A related change is calling cpp_pair() instead of cpp_linear() when mask is used. When pattern matrices are very sparse, the former function is faster. We might want to add an argument like pairwise = NULL to allow users to choose the underlying function based on the sparsity of their input and output matrices.

koheiw avatar Aug 05 '25 02:08 koheiw