RcppML icon indicating copy to clipboard operation
RcppML copied to clipboard

Upper-bounded NNLS

Open zdebruine opened this issue 3 years ago • 12 comments

Feature request to impose upper bound on NNLS solutions.

zdebruine avatar Oct 24 '22 15:10 zdebruine

Addressed in 969c62eb4ff47ce7035bd8d534500ecc072e77df and 22953f8a4783b8d4bd4e5a098050d24d33d0de9a, upper bound now can be imposed in the nnls, predict.nmf, and project functions.

Theoretical advantages remain to be seen.

zdebruine avatar Oct 24 '22 15:10 zdebruine

Hi Zach, super quick implementation! Thanks!! I am compiling the package on my Mac (using clang) and I am getting:

In file included from ../inst/include/RcppML.hpp:17:
../inst/include/RcppML/nmf.hpp:100:21: error: cannot assign to variable 'upper_bound' with const-qualified type 'const double'
        upper_bound = upper_bound;

which then fails to build the package. This doesn't happen in the CRAN version. Will check on a Win Server whether this persists.

ambuehll avatar Oct 24 '22 15:10 ambuehll

No issues to build on Win. I'll check whether a different compiler might help on mac.

ambuehll avatar Oct 24 '22 15:10 ambuehll

I'm not on a Mac but tried to address this with 4f670544313e8b2c44d423c31553384d21ece9dc removing the const specifier in the upperBound() setter for the nmf class.

Does the issue persist?

zdebruine avatar Oct 24 '22 17:10 zdebruine

works! many thanks!!!!

ambuehll avatar Oct 24 '22 19:10 ambuehll

Ok great. If you could confirm at some point that the results are similar to https://cran.r-project.org/web/packages/bvls/bvls.pdf that would be awesome!

zdebruine avatar Oct 24 '22 19:10 zdebruine

OK, so I played around a bit with a small system (900x5000). Here some findings:

  1. the bound is not respected. there are values above the upper bound (500) in the solution.

  2. Comparing to bvls, RcppML is magnitudes faster: 9s vs 256s

  3. here a plot of the bvls vs RcppML parameters image

  4. here the same plot with setting all variables above 500 to 500. image

ambuehll avatar Oct 25 '22 09:10 ambuehll

also, it would be amazing if the bounds could be set for each variable separately as in the bvls case.

ambuehll avatar Oct 25 '22 11:10 ambuehll

Reprex? I'm getting all.equal and a perfect 1:1 plot in this case:

library(RcppML)
library(bvls)
data(hawaiibirds)
A <- hawaiibirds$counts
w <- RcppML::nmf(A, 10)@w
h <- project(w, A)
upper_bound <- 2
h_bounded_RcppML <- project(w, A, upper_bound = upper_bound)
h_bounded_bvls <- apply(as.matrix(A), 2, function(b) {
  bvls::bvls(w, b, rep(0, ncol(w)), rep(upper_bound, ncol(w)))$x
})
df <- data.frame("RcppML" = as.vector(h_bounded_RcppML), "bvls" = as.vector(h_bounded_bvls))
library(ggplot2)
all.equal(df$RcppML, df$bvls)
ggplot(df, aes(bvls, RcppML)) + geom_point()

Hm, I'll look at how we could set bounds separately. That's getting a bit more complicated :)

zdebruine avatar Oct 25 '22 11:10 zdebruine

I can confirm that I it's a perfect 1:1 in your example. I'll post mine in a bit. need to re-run it...

ambuehll avatar Oct 25 '22 11:10 ambuehll

library(data.table)
library(RcppML)
library(ggplot2)

A <- readRDS("A_matrix.RDS")
B <- readRDS("B_vector.RDS")

bl = rep(0,ncol(A))
bu = rep(500,ncol(A))

system.time({
  resnnls <- RcppML::nnls(a = crossprod(A), b= crossprod(A,B),cd_maxit = 1000, upper_bound = 500)
})

system.time({
  resnnls2 = bvls(crossprod(A), crossprod(A,B), bl, bu)
})
resnnls2 <- data.table(resnnls2$x)

resnnls <- data.table(resnnls)

resnnls[,bvls:=resnnls2$V1]
resnnls[,RcppML:=V1]
# 
library(ggplot2)
ggplot(resnnls)+geom_point(aes(bvls,RcppML))

with Matrix.zip

no out of bounds, but not 1:1. We'll try to reproduce the out of bounds.

ambuehll avatar Oct 25 '22 12:10 ambuehll

First look, seems platform-dependent. On Linux I got out-of-bounds, on Windows I didn't, I have no clue why this is. Will look in a bit.

zdebruine avatar Oct 25 '22 14:10 zdebruine

Finally got a good look at this, and I think that the code .

Here's what I get when I run your example:

image

Clearly not identical answers.

But now let's calculate the squared error of both solutions:

err_bvls_bvls <- sum((B - crossprod(t(A), as.matrix(resnnls$bvls)))^2)
err_RcppML_bvls <- sum((B - crossprod(t(A), as.matrix(resnnls$RcppML)))^2)
err_bvls_bvls
# 509807.1
err_RcppML_bvls
# 461442.1

If my math is correct and all, RcppML is actually doing better. I'd be curious if this is a strange edge case or if it holds true generally. Interested to get your feedback.

zdebruine avatar Dec 15 '22 15:12 zdebruine

Closing due to inactivity and lack of theoretical clarity on how BVLS can be used within NMF.

I'm not even sure BVLS-NMF is weakly convex. For NNLS projections, BVLS can lead to suboptimal solutions with limited utility, it appears that different algorithms can discover different minima (i.e. my coordinate descent BVLS vs. the 1979 BVLS implementation in Fortran using active set methods). Further discussion welcome. I am tentatively planning to isolate BVLS in it's own function in the next major release and remove it's support in NMF or NNLS functions to simplify the codebase.

zdebruine avatar Jan 27 '23 16:01 zdebruine