Different results for multiple runs of one pairwise comparison using the same reference cell type
Hi All,
I am using scCoda to analyze changes in celltype composition for an epithelial dataset using two timepoints as covariates ('ETI_2y' and 'Healthy'). However, after running the same comparison multiple times (ETI_2y' vs 'Healthy') using the same reference celltype (set to 'automatic'), I end up with different results for some results. By investigating the results for FDR = 0.005, 0.01 and 0.05 it happens that some celltypes change in their significance depending on the runs or sometimes an additional celltype shows up for FDR = 0.05.
The distribution of celltypes looks like this:
The significant results of five runs:
My code:
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
from sccoda.util import cell_composition_data as dat
from sccoda.util import data_visualization as viz
from sccoda.util import comp_ana as mod
import sccoda.datasets as scd
## Load Count Data
count_data = pd.read_csv('adata_ETI_2y+Healthy_counts_DF.csv', index_col='sample')
count_data
## Plot Boxplots of all groups
coda_data = dat.from_pandas(count_data, covariate_columns=['timepoint'])
with plt.rc_context({"figure.figsize": (20, 8)}):
viz.boxplots(coda_data, feature_name="timepoint")
# Overview of abundance for each celltype
ax = viz.rel_abundance_dispersion_plot(
data=coda_data,
abundant_threshold=0.9,
label_cell_types=True,
)
plt.show()
# Run scCoda
Select Reference celltype
reference_ct = 'automatic'
results = {}
# Creation of the coda model
model_coda = mod.CompositionalAnalysis(coda_data, formula="timepoint", reference_cell_type=reference_ct)
model_res = model_coda.sample_hmc()
# Saving of the results in a dictionary
results['ETI_2y+Healthy'] = model_res
# Show credible effects
model_res.credible_effects(est_fdr=0.001)
model_res.credible_effects(est_fdr=0.01)
model_res.credible_effects(est_fdr=0.05)
### Summarized results for each FDR
already_printed = dict.fromkeys(results.keys(),[])
for key, result in results.items():
print('\n\n####################')
print (key)
print('####################')
for sig in [0.001, 0.01, 0.05]:
print ('\nSignificance: ' + str(sig))
print('-----------------')
creds = result.credible_effects(est_fdr= sig)
creds_True = creds[creds == True]
for ct in creds_True.index.get_level_values(1):
if ct not in already_printed[key]:
print(ct)
already_printed[key].append(ct)
Hi @BornToBeJan !
scCODA uses MCMC sampling for parameter inference, which is not deterministic and can therefore lead to small deviations in the significance level of some cell types.
You might also find this past issue helpful: #38
@johannesostner, @Zethson:
Could you please comment if this is still a concern when using the the scCODA implementation from pertpy? I ran the same analysis twice and got identical results. Now I am wondering if I was 'lucky' and should manually set seeds to ensure determinism in future runs or if the new version takes care of that by default.
Apologies if I missed anything concerning this in the documentation.