RcppML icon indicating copy to clipboard operation
RcppML copied to clipboard

scope to support semi NMF?

Open klin333 opened this issue 2 months ago • 3 comments

Hi,

Thank you for this great package!

Any scope to include semi nonnegative matrix factorization? Your docs seems to suggest ability to allow negative numbers in matrix to be factorized, but the functions no longer seem to support it. Some reference implementations:

  • Python implementation based on original paper "Ding et al. Convex and Semi-Nonnegative Matrix Factorizations." at https://github.com/cthurau/pymf/blob/master/pymf/snmf.py
  • Matlab implementation based on what appears to be an improved method by "Gillis et al. Exact and Heuristic Algorithms for Semi-Nonnegative Matrix Factorization" at https://gitlab.com/ngillis/nmfbook/-/tree/master/algorithms/semi-NMF?ref_type=heads

The Matlab implementation appears closest to the method used here, using alternating least squares, but replacing the basis matrix update step with ordinary least squares, since the basis matrix is allowed to be negative.

klin333 avatar Apr 10 '24 23:04 klin333

Yes, I used to support unconstrained "N"MF, but found that properties of this method were not useful in practice:

  • Not robust (less unique)
  • Uninterpretable
  • Not applicable to non-negative data
  • Increased bounds of search space, increased convergence complexity, longer runtime

I'm curious what the applications for this are?

In theory it would be quite easy to allow non-negativity to some amount: on line 23 of RcppML/nnls.hpp we would do if (-diff > h(i, sample) - alpha) where alpha is the tolerable amount of non-negativity. Now, alpha as a constant is only useful when scaled to the system size, and then that becomes a difficult threshold to set as NMF solves alternating NNLS solutions and what works for one iteration will change the next iteration, and so on.

I have also experimented with BVLS (bounded-variable least squares), but the same issue applies.

The thought occurs that this is similar to a leaky ReLU activation in a tied-weights autoencoder without biases, and an L2 penalty (weight.decay) on the weights matrix? That would be an online analog of semi-NMF?

zdebruine avatar Apr 11 '24 02:04 zdebruine

Hi,

Semi NMF may be useful in financial applications, eg financial returns data which are mixed (positive and negative). The original paper on semi NMF by Ding et al did mention that simply adding an offset to such data to make them all positive, was non optimal. In X = FG, by allowing both X and F to be mixed but still keeping G non negative, there can still lead to interesting decompositions into parts.

Anyhow, just an idea since I couldn't find any other R package supporting semi NMF, and I ended up hacking together a ChatGPT translated version of the above matlab scripts, using RcppML::nnls for G and ols for F in the alternating least squares solver. Seemed to work ok. So all good for this issue to be closed.

Thank you

klin333 avatar Apr 11 '24 05:04 klin333

No, this is really good. I'll leave this issue open so that when I get time to update RcppML this summer we can add a parameter nonneg = c(bool, bool) which will enforce constraints on c(w, h). By default this will be c(TRUE, TRUE) but there may be interesting applications apart from the default case.

Thanks!

zdebruine avatar Apr 11 '24 12:04 zdebruine