mppca does not follow the canonical algorithm internally
Description
The code for mppca seems to be a weird hybrid with lpca as they both use a common function, and thus does not really follow what it is supposed to do with what the manuscript says. Also, the internals for computing the threshold in terms of eigenvalues has a plain math mistake. I guess it didn't show up because internally it still does just a hard threshold, but you won't encounter it until you try to use it noisy data, probably why it went under the radar.
Anyway, I'm not sure if if a design choice or if that was tacked on later as both mppca and lpca use a lot of stuff in common, but they work fundamentally differently and you end up with a non standard mppca because of that. In particular, the one here seems a tad too aggressive because it overestimates the threshold I'd guess. In this line the formula should use a divide instead of a minus https://github.com/dipy/dipy/blob/master/dipy/denoise/localpca.py#L45, and probably also a rework since it doesn't seem to follow eq. 10 to 12 from the manuscript.
There's also a lot of other small stuff I'm not sure why it is there or done this way, but someone should probably look it over again as of now if just doesn't do what it is supposed to do, and you can't really compare it with other implementations because it does not seem to follow what one expects. That doesn't mean it is necessarily wrong, just unclear if it's a mistake or by design, but at least it does not do what one expects right now. Even feeding back in our own value of sigma, it also changes it again for the thresholding without much explanation https://github.com/dipy/dipy/blob/master/dipy/denoise/localpca.py#L195
Anyway, it at least requires some work to fix the threshold and probably some overhaul to make it behave as expected since currently it mixes two things that are designed differently. Probably other stuff I missed also as https://github.com/dipy/dipy/issues/2655 seems to have problems with some internal parts in addition.
Hi @samuelstjean,
Thanks for letting us know, we will look into it closer. Maybe @RafaelNH has an opinion since he implements this part.
Also @ShreyasFadnavis is working on denoising so he might give you an update when he find time.
Thanks for the feedback.
just as a reference: #1917
Also, as in my recent issue #2655 (which you mention), there seems to be other additional errors related to estimation of the noise threshold. I have a fairly good understanding of the original algorithms and PCA in general, I will also be looking at this again now that I have some extra time.
If you are correct, then the noise estimation works on rows, and the thresholding works on columns (or the other way around), but there is no distinction here it seems, so it would just need a major rework all around in that case.