MicrobiomeStat
MicrobiomeStat copied to clipboard
Missing Adj Variable Term in Coefficient Table from "generate_beta_change_test_pair" function
Hello,
Thank you for the development of such a great package.
I have been testing your package functions for beta diversity analysis of paired samples.
When I used "generate_beta_change_test_pair" function on both my data and "peerj32.obj" from the package, I cannot see the adj variable term in the outputted coefficient tables.
An example when "peerj32.obj" was used:
library(MicrobiomeStat)
data("peerj32.obj")
generate_beta_change_test_pair(
data.obj = peerj32.obj,
dist.obj = NULL,
time.var = "time",
subject.var = "subject",
group.var = "group",
adj.vars = "sex",
change.base = "1",
dist.name = c('BC', 'Jaccard')
)
I am obtaining the output shown in the ss below:
I had a brief check on the function, and could not see "adj.vars" in the formula
# Create formula for linear model
formula <-
stats::as.formula(paste0("distance", "~", paste(c(
time_varying_info$time_varying_vars, group.var
), collapse = "+")))
and when I added, I could obtain the coef table with that adj variable as shown in image below:
Thank you for your help in advance.
Best
Dear @eray-sahin,
Thank you for your interest in MicrobiomeStat and for reporting this issue. I'd like to clarify how adjustment variables are handled in the generate_beta_change_test_pair function.
How Adjustment Variables Are Handled
The behavior you observed is actually by design. In the function, adjustment variables (specifically non-time-varying covariates like sex) are handled differently than you might expect:
- Pre-adjustment of distance matrices: Instead of including non-time-varying covariates directly in the linear model formula, we adjust the distance matrices themselves using the
mStat_calculate_adjusted_distancefunction (lines 127-130 in the code).
# Adjust distances for non-time-varying variables
if (length(time_varying_info$non_time_varying_vars) > 0){
dist.obj <- mStat_calculate_adjusted_distance(data.obj, dist.obj, time_varying_info$non_time_varying_vars, dist.name)
}
- Linear model formula: Only time-varying covariates and the group variable are included in the linear model formula (lines 170-173):
formula <-
stats::as.formula(paste0("distance", "~", paste(c(
time_varying_info$time_varying_vars, group.var
), collapse = "+")))
Statistical Rationale
This approach is intentional and statistically sound. Here's why:
-
Distance matrices are not independent observations: Distance matrices represent pairwise relationships between samples, not independent observations. Directly including covariates in a linear model with distances as the outcome violates the independence assumption of linear regression.
-
Proper adjustment method: The
mStat_calculate_adjusted_distancefunction uses multidimensional scaling (MDS) to transform the distances into coordinates, adjusts these coordinates for covariates using linear models, and then recalculates the distances from the adjusted coordinates. This is a statistically appropriate way to adjust distance matrices for covariates. -
Separation of concerns: By adjusting the distance matrices before the main analysis, we ensure that the effect of non-time-varying covariates is properly accounted for, while keeping the main analysis focused on the variables of interest (time and group).
This is why you don't see the adjustment variables in the coefficient table - their effect has already been incorporated into the adjusted distance matrices used in the analysis.
Alternative Approach
If you're interested in seeing the direct effect of these covariates in the model, you could modify the function to include them in the linear model formula. However, please be aware that this approach may not be statistically valid for distance-based analyses.
Thank you for your understanding and for using MicrobiomeStat!
Best regards, Chen Yang