proxyC
proxyC copied to clipboard
Dev mask update
Added maskUpdate() to create complex patterns by combining matrices. The below operations are equivalent to logical operations on matrices
mak2 <- msk | mask(colnames(mt1), colnames(mt3))
but creating multiple pattern matrices takes a lot of space in RAM. Some of the matrices can be dense and very large too. I am happy to change the names of the function and arguments if you have better ideas.
> mt1 <- Matrix::rsparsematrix(100, 6, 1.0)
> colnames(mt1) <- c("a", "a", "d", "d", "e", "e")
> mt2 <- Matrix::rsparsematrix(100, 5, 1.0)
> colnames(mt2) <- c("a", "b", "c", "d", "e")
> mt3 <- Matrix::rsparsematrix(100, 5, 1.0)
> colnames(mt3) <- c("e", "e", "e", "e", "e")
>
> # create a pattern matrix
> (msk <- mask(colnames(mt1), colnames(mt2)))
6 x 5 sparse Matrix of class "lgTMatrix"
a b c d e
a | . . . .
a | . . . .
d . . . | .
d . . . | .
e . . . . |
e . . . . |
> simil(mt1, mt2, margin = 2, mask = msk, drop0 = TRUE)
6 x 5 sparse Matrix of class "dgTMatrix"
a b c d e
a 0.03070049 . . . .
a -0.03691416 . . . .
d . . . -0.02489279 .
d . . . 0.03654560 .
e . . . . -0.102052365
e . . . . 0.003882867
>
> # update a pattern matrix
> (msk2 <- maskUpdate(msk, colnames(mt1), colnames(mt3), operator = "or"))
6 x 5 sparse Matrix of class "lgTMatrix"
a b c d e
a | . . . .
a | . . . .
d . . . | .
d . . . | .
e | | | | |
e | | | | |
> simil(mt1, mt2, margin = 2, mask = msk2, drop0 = TRUE)
6 x 5 sparse Matrix of class "dgTMatrix"
a b c d e
a 0.03070049 . . . .
a -0.03691416 . . . .
d . . . -0.02489279 .
d . . . 0.03654560 .
e 0.03039425 0.10977245 0.1437414 -0.16394760 -0.102052365
e -0.09407865 0.05965081 -0.0605547 -0.15769245 0.003882867
A related change is calling cpp_pair() instead of cpp_linear() when mask is used. When pattern matrices are very sparse, the former function is faster. We might want to add an argument like pairwise = NULL to allow users to choose the underlying function based on the sparsity of their input and output matrices.