MicrobiomeStat icon indicating copy to clipboard operation
MicrobiomeStat copied to clipboard

Missing Adj Variable Term in Coefficient Table from "generate_beta_change_test_pair" function

Open eray-sahin opened this issue 7 months ago • 1 comments

Hello,

Thank you for the development of such a great package.

I have been testing your package functions for beta diversity analysis of paired samples.

When I used "generate_beta_change_test_pair" function on both my data and "peerj32.obj" from the package, I cannot see the adj variable term in the outputted coefficient tables.

An example when "peerj32.obj" was used:

library(MicrobiomeStat)
data("peerj32.obj")  

generate_beta_change_test_pair(
  data.obj = peerj32.obj,
  dist.obj = NULL,
  time.var = "time",
  subject.var = "subject",
  group.var = "group",
  adj.vars = "sex",
  change.base = "1",
  dist.name = c('BC', 'Jaccard')
)

I am obtaining the output shown in the ss below:

Image

I had a brief check on the function, and could not see "adj.vars" in the formula

 # Create formula for linear model
      formula <-
        stats::as.formula(paste0("distance", "~", paste(c(
          time_varying_info$time_varying_vars, group.var
        ), collapse = "+")))

and when I added, I could obtain the coef table with that adj variable as shown in image below:

Image

Thank you for your help in advance.

Best

eray-sahin avatar Apr 18 '25 22:04 eray-sahin

Dear @eray-sahin,

Thank you for your interest in MicrobiomeStat and for reporting this issue. I'd like to clarify how adjustment variables are handled in the generate_beta_change_test_pair function.

How Adjustment Variables Are Handled

The behavior you observed is actually by design. In the function, adjustment variables (specifically non-time-varying covariates like sex) are handled differently than you might expect:

  1. Pre-adjustment of distance matrices: Instead of including non-time-varying covariates directly in the linear model formula, we adjust the distance matrices themselves using the mStat_calculate_adjusted_distance function (lines 127-130 in the code).
# Adjust distances for non-time-varying variables
if (length(time_varying_info$non_time_varying_vars) > 0){
  dist.obj <- mStat_calculate_adjusted_distance(data.obj, dist.obj, time_varying_info$non_time_varying_vars, dist.name)
}
  1. Linear model formula: Only time-varying covariates and the group variable are included in the linear model formula (lines 170-173):
formula <-
  stats::as.formula(paste0("distance", "~", paste(c(
    time_varying_info$time_varying_vars, group.var
  ), collapse = "+")))

Statistical Rationale

This approach is intentional and statistically sound. Here's why:

  1. Distance matrices are not independent observations: Distance matrices represent pairwise relationships between samples, not independent observations. Directly including covariates in a linear model with distances as the outcome violates the independence assumption of linear regression.

  2. Proper adjustment method: The mStat_calculate_adjusted_distance function uses multidimensional scaling (MDS) to transform the distances into coordinates, adjusts these coordinates for covariates using linear models, and then recalculates the distances from the adjusted coordinates. This is a statistically appropriate way to adjust distance matrices for covariates.

  3. Separation of concerns: By adjusting the distance matrices before the main analysis, we ensure that the effect of non-time-varying covariates is properly accounted for, while keeping the main analysis focused on the variables of interest (time and group).

This is why you don't see the adjustment variables in the coefficient table - their effect has already been incorporated into the adjusted distance matrices used in the analysis.

Alternative Approach

If you're interested in seeing the direct effect of these covariates in the model, you could modify the function to include them in the linear model formula. However, please be aware that this approach may not be statistically valid for distance-based analyses.

Thank you for your understanding and for using MicrobiomeStat!

Best regards, Chen Yang

cafferychen777 avatar Apr 19 '25 08:04 cafferychen777