SURVIVOR
SURVIVOR copied to clipboard
behavior of SURVIVOR merge on multiple samples from multiple callers
Thank you for making such a useful tool!!
We have multiple samples and each sample have multiple vcfs generated by different SV callers. Based on discussion in #95, we should do the following:
STEP 1: merge all vcfs for the same sample into 1 vcf per sample with SURVIVOR merge
The result is a multi-column vcf, with headers copied from the individual vcfs:
Resulting vcf for Sample1:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Sample1_manta Sample1_delly Sample1_smoove
I 20105 101 N <DEL> 180 PASS SUPP=2;SUPP_VEC=101;SVLEN=95;SVTYPE=DEL;SVMETHOD=SURVIVOR1.0.7;CHR2=I;END=20163;CIPOS=0,819;CIEND=0,892;STRANDS=+- GT:PSV:LN:DR:ST:QV:TY:ID:RAL:AAL:CO 0/1:NA:131:10,0:--:180:INV:INV00000001:NA:NA:I_20924-I_21055 ./.:NaN:0:0,0:--:NaN:NaN:NaN:NAN:NAN:NAN 0/0:NA:58:0,7:+-:0:DEL:101:NA:NA:I_20105-I_20163
For Sample2:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Sample2_manta Sample2_delly Sample2_smoove
I 20105 DEL000SUR AAATTTTTTTTCCGCAAAATCAGGAAAAATTCAGAAAAAGACAGTCAAAAAATTGTAGA ATC 601 PASS SUPP=3;SUPP_VEC=111;SVLEN=82;SVTYPE=DEL;SVMETHOD=SURVIVOR1.0.7;CHR2=I;END=20163;CIPOS=0,819;CIEND=0,892;STRANDS=+- GT:PSV:LN:DR:ST:QV:TY:ID:RAL:AAL:CO 0/0:NA:131:11,0:--:180:INV:INV00000001:NA:NA:I_20924-I_21055 1/1:NA:58:0,18:+-:601:DEL:I_20105_20163_-58:AAATTTTTTTTCCGCAAAATCAGGAAAAATTCAGAAAAAGACAGTCAAAAAATTGTAGA:ATC:I_20105-I_20163 1/1:NA:58:0,7:+-:354:DEL:101:NA:NA:I_20105-I_20163
STEP 2: combine the 1 vcf per sample for all samples with SURVIVOR merge
The headers used the first headers in each of the vcfs above, but the fields seems re-computed combining columns in each input vcf.
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Sample1_manta Sample2_manta
I 20105 DEL000SUR AAATTTTTTTTCCGCAAAATCAGGAAAAATTCAGAAAAAGACAGTCAAAAAATTGTAGA ATC 601 PASS SUPP=2;SUPP_VEC=11;SVLEN=-89;SVTYPE=DEL;SVMETHOD=SURVIVOR1.0.7;CHR2=I;END=20163;CIPOS=0,0;CIEND=0,0;STRANDS=+- GT:PSV:LN:DR:ST:QV:TY:ID:RAL:AAL:CO 0/1:101:95:0,0:+-:180:DEL:101:NA:NA:I_20105-I_20163 1/1:111:82:0,0:+-:601:DEL:DEL000SUR:AAATTTTTTTTCCGCAAAATCAGGAAAAATTCAGAAAAAGACAGTCAAAAAATTGTAGA:ATC:I_20105-I_20163
I was a bit confused by the discussion in #127
no SURVIVOR does not take the GT into account as many tools often dont report the GT.
This was referring to STEP 1, right? In which case the GT field was simply copied over from input vcfs. Whereas in STEP 2 it looks like the GT most different from REF was kept while merging calls for the same sample (1/1 > 1/0 > 0/0)?
Option was SURVIVOR merge ... 1000 1 0 0 1 30
, and version is 1.0.7 from bioconda.
Thanks! Dan