clonevol
clonevol copied to clipboard
NA error
Hi!
I 've recently used SciClone for a sample pair of primary and relapse. In the output table, the are some NA values which, as far as I know, correspond to mutations that are not shared between samples.
Here is an example of the data frame:
chr st primary.ref primary.var primary.vaf primary.cn
100 1 56790773 0 0 0.00 NA
101 1 57427557 39 18 31.58 2
102 1 58035059 22 9 29.03 2
primary.cleancn primary.depth relapse.ref relapse.var relapse.vaf
100 NA 0 47 19 28.79
101 2 57 42 18 30.00
102 2 31 29 8 21.62
relapse.cn relapse.cleancn relapse.depth adequateDepth cluster
100 2 2 66 0 NA
101 2 2 60 1 2
102 2 2 37 1 2
cluster.prob.1 cluster.prob.2
100 NA NA
101 0.0156438858 0.9843561
102 0.0002460844 0.9997539
If I try to use infer.clonal.models with these results as:
>df
cluster primary.vaf primary.depth relapse.vaf relapse.depth
100 NA 0.00 0 28.79 66
101 2 31.58 57 30.00 60
102 2 29.03 31 21.62 37
x <- infer.clonal.models(variants=df,
cluster.col.name="cluster",
vaf.col.names=vaf.col.names,
subclonal.test="none",
subclonal.test.model="none",
cluster.center="mean",
model = 'monoclonal',
vaf.in.percent = TRUE,
founding.cluster=1,
min.cluster.vaf=0.01,
p.value.cutoff=0.05)
I got the following error:
Sample 1: primary.vaf <-- primary.vaf
Sample 2: relapse.vaf <-- relapse.vaf
Using monoclonal model
primary.vaf : Enumerating clonal architectures...
Determining if cluster VAF is significantly positive...
Exluding clusters whose VAF < min.cluster.vaf=0.01
Non-positive VAF clusters: NA,NA
Error in if (v[i, ]$excluded) { : missing value where TRUE/FALSE needed
Therefore I removed NA:
> df <- na.omit(df)
And I run again and got another error:
Sample 1: primary.vaf <-- primary.vaf
Sample 2: relapse.vaf <-- relapse.vaf
Using monoclonal model
primary.vaf : Enumerating clonal architectures...
Determining if cluster VAF is significantly positive...
Exluding clusters whose VAF < min.cluster.vaf=0.01
Non-positive VAF clusters:
primary.vaf : 1 clonal architecture model(s) found
relapse.vaf : Enumerating clonal architectures...
Determining if cluster VAF is significantly positive...
Exluding clusters whose VAF < min.cluster.vaf=0.01
Non-positive VAF clusters:
relapse.vaf : 1 clonal architecture model(s) found
Finding matched clonal architecture models across samples...
Found 1 compatible model(s)
Merging clonal evolution trees across samples...
Error in ci$sample.with.cell.frac.ci[cia$is.zero.cell.frac] = paste0("°", :
NAs are not allowed in subscripted assignments
I would be grateful If you could help me solve this error. A part from that, after running SciClone, is it recommended to do subclonal test with bootstrapping? What is the point of running clonevol as subclonal.test="bootstrap" and subclonal.test.model="non-parametric"?
If the idea is to run fishplot after clonevol, should I use rescale.vaf function? How?
Thank you in advance!
This may be a bug. Could you share some reproducible data and code? Thanks.
I cannot share the data but I will try to give you a dataset with same problem.
Hi!
The attached file contains the sciclone results of a primary and relapse pair of samples of this paper http://www.pnas.org/content/113/40/11306 . I've run clonevol like this:
library(clonevol)
library(fishplot)
df = read.table('results.txt', header=TRUE, sep = '\t')
df <- na.omit(df)
vaf.col.names <- grep("*.vaf", colnames(df), value = TRUE)
x <- infer.clonal.models(variants=df,
cluster.col.name="cluster",
vaf.col.names=vaf.col.names,
subclonal.test="none",
subclonal.test.model="none",
cluster.center="mean",
model = 'monoclonal',
vaf.in.percent = TRUE,
founding.cluster=1,
min.cluster.vaf=0.01,
p.value.cutoff=0.05)
and it gave the same error:
Sample 1: primary.vaf <-- primary.vaf
Sample 2: relapse.vaf <-- relapse.vaf
Using monoclonal model
primary.vaf : Enumerating clonal architectures...
Determining if cluster VAF is significantly positive...
Exluding clusters whose VAF < min.cluster.vaf=0.01
Non-positive VAF clusters:
primary.vaf : 2 clonal architecture model(s) found
relapse.vaf : Enumerating clonal architectures...
Determining if cluster VAF is significantly positive...
Exluding clusters whose VAF < min.cluster.vaf=0.01
Non-positive VAF clusters:
relapse.vaf : 2 clonal architecture model(s) found
Finding matched clonal architecture models across samples...
Found 2 compatible model(s)
Merging clonal evolution trees across samples...
Error in ci$sample.with.cell.frac.ci[cia$is.zero.cell.frac] = paste0("°", :
NAs are not allowed in subscripted assignments
Thanks! Also please It would be great if you could tell me when bootstrap test is worth it and If I should do it in this case. results.txt
I turned out that the bootstrap was not performed, and thus produced such error due to no cellular fraction was estimated. I'll add this to the list of bugs to be fixed. Thanks for your report.
I would recommend bootstrap to be used by which cellular fraction can be estimated and used to interpret the models. I am working on the technical details and will post it soon.
Thanks.
Did you ever fix this bug? I am having the same issue.
Sorry not yet. Is there a specific reason why you chose not to run the bootstrap?
Sorry - I thought I was. I had subclonal.test="bootstrap" but I hadn't changed the value of min.cluster.vaf to NULL. Once I did then the error went away.
I don't think min.cluster.vaf is the root cause. What did you use as min.cluster.vaf before?
It was the value copied straight from the usage details -0.01.
I've run clonevol for a few samples now, using PyClone results as the input, but I can't determine any clonal models for any sample. I am running this for single samples (none of my samples are related). Would this be causing the problems? Can clonevol work using single samples?
You meant 0.01, correct? For Pyclone, please see this https://github.com/hdng/clonevol/issues/4. I often see Pyclone results in too many clusters (could be due to parameter setting). Before running clonevol, it is important to visually evaluate the clusters and clean them up (eg. removing clusters with small number of variants, removing clusters that look like outliers from other clusters), or even rerun Pyclone a couple of times to obtain the best clustering.
My experience with Pyclone is that it tends to overestimate the number of clones/clusters in the sample unless you have the type of input data for what Pyclone was designed for. I would not use Pyclone unless I have a deep targeted sequencing panel of genes which produce between 100-1000 SNV per sample with a mean coverage of x100 or more. To have allele-specific copy number estimations and the purity really helps Pyclone to make good corrected VAFs (CCF). A part from that, sometimes it is just a matter of more iterations until there is chain convergence...
I have the same issue. When I run the bootstrap, it said one of two samples found no model. So, I remove the bootstrap and come across the same error.
I always recommend to run bootstrap. If it does not find a model, non-bootstrap won't find it either. When no model is found for a sample, it indicates clustering issue or the data are noisier than the default tolerance of clonevol in that sample.
To find out what may be wrong with the clustering, please see Step 3: Evaluating the variant clustering results of the vignette (https://raw.githubusercontent.com/hdng/clonevol/master/vignettes/clonevol.pdf).