SCIPhI
SCIPhI copied to clipboard
How to set more permissive parameters?
Hi, I’m currently running SCIPhI on some simulated sets but i’m getting some peculiar patterns and I’m not sure whether this is a bug of the tool or a misspecification of parameter settings or something else. So, the simulated data I’m using consists of mpileups with 40 cells, ~10k sites (all variable) and sequencing depth ~ 5X.
My idea is to lower the SCIPhI settings to its minimum thus letting it be really permissive and allow for most sites to be picked up for phylogenetic reconstruction. I tried the following command line (in which I tried to set the parameters controlling for depth to 0):
sciphi -o test --in sampleNames -u 0 --ncf 0 --mff $minfreq --md 0 --mmw 4 --mnp 1 --ms 0 --mc 0 --unc true --mf 0 -l 200000 --seed $RANDOM ${sim}.mpileup
However this is what I get:
Reading the config file: ... done!
Reading the mpileup file: num Samples: 41
total # mut: 0 currently used: 0
normal - freq: 0.135070376076846 tmp: 0.135070376076846 SD: 0.01 count: 0 trails: 0
normal - overDis: 100 tmp: 100 SD: 5 count: 0 trails: 0
normla - alpha: 13.5070376076846 beta: 86.4929623923154
mutation - overDis: 2 tmp: 2 SD: 0.1 count: 0 trails: 0
mutation - alpha: 0.819906165230872 beta: 1.27014075215369
drop: 0.9 SD: 0.01 count: 0 trails: 0
lambda: 0 SD: 0.01 count: 0 trails: 0
1
done!
numUniqMuts: 0
dataUsage<0>: 0.1 0.1
369 0
newDataSize: 37 36.9
The new best score is: -367258.331190865
num Samples: 41
total # mut: 369 currently used: 37
[…]
As you can see, the total number of mutations identified is much lower than 10,000. So my question is: Is this just a question of low power for detection (and therefore expected), or am I setting the parameters wrong?
Thank you very much in advance, J
Hi Joao, It looks like sciphi has excluded almost all mutations before running tree reconstruction. There only appear to be 369 mutations left after filtering.
I have tried running sciphi on low coverage data as well, and observed similar oddities. I hacked around in the source code a bit, and I have come to the conclusion that the site filters tend to filter out many or all sites when average coverage gets near 3x. I tried disabling the filters, but I could not get sciphi to run quickly after that. I can share my version of sciphi with disabled site filters if you are interested in trying it. Perhaps you will have more luck.
Also, you might try https://github.com/raphael-group/SBMClone instead. It performs surprisingly well in my hands on simulated data.
Thanks @winni2k! Will definitely take a look at SBMClone. :)