scCODA Different results for multiple runs of one pairwise comparison using the same reference cell type

Hi All,

I am using scCoda to analyze changes in celltype composition for an epithelial dataset using two timepoints as covariates ('ETI_2y' and 'Healthy'). However, after running the same comparison multiple times (ETI_2y' vs 'Healthy') using the same reference celltype (set to 'automatic'), I end up with different results for some results. By investigating the results for FDR = 0.005, 0.01 and 0.05 it happens that some celltypes change in their significance depending on the runs or sometimes an additional celltype shows up for FDR = 0.05.

The distribution of celltypes looks like this:

The significant results of five runs:

My code:

import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

from sccoda.util import cell_composition_data as dat
from sccoda.util import data_visualization as viz
from sccoda.util import comp_ana as mod

import sccoda.datasets as scd

## Load Count Data

count_data = pd.read_csv('adata_ETI_2y+Healthy_counts_DF.csv', index_col='sample')
count_data

## Plot Boxplots of all groups

coda_data = dat.from_pandas(count_data, covariate_columns=['timepoint'])

with plt.rc_context({"figure.figsize": (20, 8)}):
    viz.boxplots(coda_data, feature_name="timepoint")

# Overview of abundance for each celltype

ax = viz.rel_abundance_dispersion_plot(
    data=coda_data,
    abundant_threshold=0.9,
    label_cell_types=True,
)

plt.show()

# Run scCoda

Select Reference celltype

reference_ct = 'automatic'

results = {}

# Creation of the coda model 
model_coda = mod.CompositionalAnalysis(coda_data, formula="timepoint",  reference_cell_type=reference_ct)
model_res = model_coda.sample_hmc()

# Saving of the results in a dictionary
results['ETI_2y+Healthy'] = model_res

# Show credible effects

model_res.credible_effects(est_fdr=0.001)

model_res.credible_effects(est_fdr=0.01)

model_res.credible_effects(est_fdr=0.05)

### Summarized results for each FDR

already_printed = dict.fromkeys(results.keys(),[])
for key, result in results.items():

    print('\n\n####################')
    print (key)
    print('####################')
    
    for sig in [0.001, 0.01, 0.05]:
        print ('\nSignificance: ' + str(sig))
        print('-----------------')
        creds = result.credible_effects(est_fdr= sig)
        creds_True = creds[creds == True]
        
        for ct in creds_True.index.get_level_values(1):
                if ct not in already_printed[key]:
                    print(ct)
                    already_printed[key].append(ct)

Oct 28 '24 15:10 BornToBeJan

Hi @BornToBeJan !

scCODA uses MCMC sampling for parameter inference, which is not deterministic and can therefore lead to small deviations in the significance level of some cell types.

You might also find this past issue helpful: #38

Oct 29 '24 16:10 johannesostner

@johannesostner, @Zethson:

Could you please comment if this is still a concern when using the the scCODA implementation from pertpy? I ran the same analysis twice and got identical results. Now I am wondering if I was 'lucky' and should manually set seeds to ensure determinism in future runs or if the new version takes care of that by default.

Apologies if I missed anything concerning this in the documentation.

Apr 23 '25 10:04 mschilli87