pySCENIC icon indicating copy to clipboard operation
pySCENIC copied to clipboard

Significant variation in GRNBoost2 results with minor cell subsampling (removing one cell) in pySCENIC

Open jklupup opened this issue 6 months ago • 1 comments

When running the GRN step in pySCENIC, I observed substantial differences in the output adjacencies.csv after removing just one cell from the expression matrix. Specifically:

Using the ​​full expression matrix​​ (e.g., thousands of cells) vs. a matrix ​​missing one cell​​ yields only ​​56.21% overlap in TF-target pairs​​. This level of variability seems unexpectedly high for a dataset of this scale.

I wonder if it's something wrong with my code

Code:

if [ ! -f grn.SUCCESS ]; then
    arboreto_with_multiprocessing.py \
      $count_loom \
      $tf_list \
      --num_workers 16 \
      --output adjacencies.csv \
      --method grnboost2 \
      --sparse \
      --seed 1 \
    && touch grn.SUCCESS
fi

if [ ! -f grn.SUCCESS ]; then echo "grn error"; exit 1; fi

jklupup avatar Jul 15 '25 02:07 jklupup

Probably you would see the same variation if you keep the same expression matrix, but just run with different seeds.

ghuls avatar Jul 28 '25 10:07 ghuls