tedana icon indicating copy to clipboard operation
tedana copied to clipboard

Only mean centering PCA components

Open handwerkerd opened this issue 3 weeks ago • 4 comments

In playing around with ICA convergence failures @marms7 and I realized that not variance scaling the data pre & post PCA seems to remove almost all convergence failures and (n=1) might even give better final ICA components for denoising. This is a work in progress and we're still testing, but I wanted to share here in case others have thoughts.

Changes proposed in this pull request:

  • Added a --mean_center_only option that removes variance scaling and zscoring of data
    • I don't love this because, if we decide this will eventually be the default, we'll need to change the input option, but it's enough for me (and others to start testing now)
  • Switched RobustICA to using unit-variance for whitening to match the recommended default fastICA setting and to now match what we're doing in our single call to fastICA (@BahmanTahayori & @Lestropie any reason this might be an issue with robustICA?)

handwerkerd avatar Dec 04 '25 14:12 handwerkerd

Codecov Report

:x: Patch coverage is 53.84615% with 6 lines in your changes missing coverage. Please review. :white_check_mark: Project coverage is 89.61%. Comparing base (39cd099) to head (0493576).

Files with missing lines Patch % Lines
tedana/decomposition/pca.py 50.00% 4 Missing and 2 partials :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1271      +/-   ##
==========================================
- Coverage   89.86%   89.61%   -0.26%     
==========================================
  Files          29       29              
  Lines        4383     4392       +9     
  Branches      725      727       +2     
==========================================
- Hits         3939     3936       -3     
- Misses        295      304       +9     
- Partials      149      152       +3     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar Dec 04 '25 15:12 codecov[bot]

More familiar with how PCA is used in the upstream sliding-window thermal denoising than how it's used in tedana, but I'll play out this train of thought and see if it's applicable.

Some will be familiar with the noise map estimate that comes out of the sliding window noise level estimation. This can be strongly influenced by scanner B1- bias field correction, which makes images homogeneous in brightness in the brain but heteroscedastic. Each of those individual noise level estimates is however derived assuming that the noise level within the finite sliding window patch is constant, which for strong bias fields / large patches may be violated. This can be addressed through use of a variance-stabilising transform, which rescales (and potentially non-linearly transforms) the data based on some prior estimate of the noise level map prior to PCA. My own denoising tool does this in an iterative fashion, using the noise estimate from the previous iteration to standardise variance in the current iteration. From a cursory inspection, I think you're trying to achieve a similar thing here, only that you're using the variance of the empirical data, which is likely not as suitable a reference magnitude for such a transformation as is a principled noise level estimate. So permitting the propagation of a noise level estimate from a prior pipeline step through to inform this step may be an overall more robust approach. Where this is not implemented or unavailable, I don't have any fundamental objections to unit-variance, but am not well placed to discover any fault with such.

One thing to watch out for is image voxels that are consistently zero-filled. I've recently seen data from multiple vendors where voxels outside the brain are filled with zeroes across all volumes, and this breaks PCA. Just be sure it's not the Z-transform dividing by zero and introducing NaNs before you even attempt PCA. https://github.com/Lestropie/dwidenoise2/issues/1

I also recall when investigating demeaning & whitening for fixing MP-PCA issues that it was somewhere advocated to do an iterative double-centering approach of demeaning across both axes multiple times, as otherwise there could be order-dependency issues. https://github.com/MRtrix3/mrtrix3/pull/2363 Not sure if that's what this line is doing internally? kept_data = stats.zscore(kept_data, axis=None) # variance normalize everything But if so, it would seem to make the prior line: kept_data = stats.zscore(kept_data, axis=1) # variance normalize time series redundant?

Lestropie avatar Dec 05 '25 01:12 Lestropie

One thing to watch out for is image voxels that are consistently zero-filled.

I think the adaptive masking step should remove any voxels that are all zeroes, so we should be safe in this case.

Not sure if that's what this line is doing internally? kept_data = stats.zscore(kept_data, axis=None) # variance normalize everything But if so, it would seem to make the prior line: kept_data = stats.zscore(kept_data, axis=1) # variance normalize time series redundant?

The "variance normalize everything" just z-scores across the whole array with a single mean value and SD value. As far as I can recall, it doesn't have much effect since the mean across each column will be zero and the SD will be one from the previous z-score step. I'm not actually sure why we have it in the code.

tsalo avatar Dec 12 '25 13:12 tsalo

The "variance normalize everything" just z-scores across the whole array with a single mean value and SD value.

Ah OK. So having a first normalisation done independently per time series before anything else might be beneficial in the presence of a strong B1- bias field (whether intensity corrected or not), as it will help to level out differences in signal / noise level across space. I think normalising along the other axis in this case is going to correct for any temporal signal drift?

But I'm skeptical about Z-transforming the entire data matrix in one go, ie. an axis-agnostic way. I do have some limited recollection seeded from this comment (unfortunately link is dead) that normalising across each axis individually, then iterating over all axes again at least once, was beneficial for PCA. But I'm not having success finding documented justification for it.

So unless someone can either find such justification, or is willing to try it out as a potential alternative solution to the problematic data that seeded the PR, happy for the idea to be written off as out of scope.

Lestropie avatar Dec 13 '25 01:12 Lestropie

As a brief update here, @marms7 and I are trying to figure out more quantitative tests of "betterness" We saw that mean centering without any variance scaling meant ICA nearly always converged (a good thing), but we weren't sure if the components were better. One metric we tested was a kappa rho difference score. For every component, we calculated a kappa & rho value and the score was abs(kappa-rho)/(kappa+rho). That is, a component that is more purely kappa OR rho has a higher score, which would mean that there is less "mixing" of kappa and rho signal sources within each component. We then ran this for 175 runs (25 subjects) for a preselected number of ICA components from 20-100. For zscoring (what is currently in tedana) ICA sometimes failed to converge and then kept repeating with new seeds until one converged. The attached figure shows the results for the median values across the 175 runs.

Since ICA has multiple components each with a KappaRho Difference score, we also needed to figure out how to combine those value for each run. The figure shows taking the mean across components in green and the median in pink. Since each component represents a different amount of the overall variance of the data, cyan is the weighted sum of each components variance explained by the KappaRho Difference score. Particularly since we're testing ways to scale variance of time series, this seems to be the most appropriate option, but we included the simpler values anyway.

The solid lines are just mean centering the voxel time series while the dashed lines also zscore (variance scale) the date.

median_centering_kapparho_diff

The main observation is that the KappaRho difference scores decrease as the number of components increase with just mean centering. For zscoring, with the variance-weighted sum, there's less of a relationship between the difference score and the number of components, which seems to be the desired outcome. Without variance weighting it also decreases, but that's because very low variance components have more mixed kappa & rho values, but also don't contribute as much to the overall signal.

There's a bit more investigation we plan to do on this, but wanted to add an update to this PR.

handwerkerd avatar Dec 17 '25 18:12 handwerkerd