scCODA icon indicating copy to clipboard operation
scCODA copied to clipboard

Different results for multiple runs of one pairwise comparison using the same reference cell type

Open BornToBeJan opened this issue 1 year ago • 2 comments

Hi All,

I am using scCoda to analyze changes in celltype composition for an epithelial dataset using two timepoints as covariates ('ETI_2y' and 'Healthy'). However, after running the same comparison multiple times (ETI_2y' vs 'Healthy') using the same reference celltype (set to 'automatic'), I end up with different results for some results. By investigating the results for FDR = 0.005, 0.01 and 0.05 it happens that some celltypes change in their significance depending on the runs or sometimes an additional celltype shows up for FDR = 0.05.

The distribution of celltypes looks like this:

Image

The significant results of five runs:

Image

My code:

import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

from sccoda.util import cell_composition_data as dat
from sccoda.util import data_visualization as viz
from sccoda.util import comp_ana as mod

import sccoda.datasets as scd

## Load Count Data

count_data = pd.read_csv('adata_ETI_2y+Healthy_counts_DF.csv', index_col='sample')
count_data

## Plot Boxplots of all groups

coda_data = dat.from_pandas(count_data, covariate_columns=['timepoint'])

with plt.rc_context({"figure.figsize": (20, 8)}):
    viz.boxplots(coda_data, feature_name="timepoint")

# Overview of abundance for each celltype

ax = viz.rel_abundance_dispersion_plot(
    data=coda_data,
    abundant_threshold=0.9,
    label_cell_types=True,
)

plt.show()

# Run scCoda

Select Reference celltype

reference_ct = 'automatic'

results = {}

# Creation of the coda model 
model_coda = mod.CompositionalAnalysis(coda_data, formula="timepoint",  reference_cell_type=reference_ct)
model_res = model_coda.sample_hmc()

# Saving of the results in a dictionary
results['ETI_2y+Healthy'] = model_res

# Show credible effects

model_res.credible_effects(est_fdr=0.001)

model_res.credible_effects(est_fdr=0.01)

model_res.credible_effects(est_fdr=0.05)

### Summarized results for each FDR

already_printed = dict.fromkeys(results.keys(),[])
for key, result in results.items():

    print('\n\n####################')
    print (key)
    print('####################')
    
    for sig in [0.001, 0.01, 0.05]:
        print ('\nSignificance: ' + str(sig))
        print('-----------------')
        creds = result.credible_effects(est_fdr= sig)
        creds_True = creds[creds == True]
        
        for ct in creds_True.index.get_level_values(1):
                if ct not in already_printed[key]:
                    print(ct)
                    already_printed[key].append(ct)

BornToBeJan avatar Oct 28 '24 15:10 BornToBeJan

Hi @BornToBeJan !

scCODA uses MCMC sampling for parameter inference, which is not deterministic and can therefore lead to small deviations in the significance level of some cell types.

You might also find this past issue helpful: #38

johannesostner avatar Oct 29 '24 16:10 johannesostner

@johannesostner, @Zethson:

Could you please comment if this is still a concern when using the the scCODA implementation from pertpy? I ran the same analysis twice and got identical results. Now I am wondering if I was 'lucky' and should manually set seeds to ensure determinism in future runs or if the new version takes care of that by default.

Apologies if I missed anything concerning this in the documentation.

mschilli87 avatar Apr 23 '25 10:04 mschilli87