cortex icon indicating copy to clipboard operation
cortex copied to clipboard

wrong ref allele given in vcf of calls

Open rmcolq opened this issue 9 years ago • 0 comments

When running the indep pipeline to create a massive vcf of variants in a groups of samples, get errors from bcf merge step. Here are some examples of pairs of samples which have conflicting ref alleles given in the raw vcf:

data2/users/phelim/ana/staph/cortex/results/C00001083/vcfs/C00001083_wk_flow_I_RefCC_FINALcombined_BC_calls_at_all_k.raw.vcf R00000022 616858 UNION_BC_k31_var_12940 T A . PASS KMER=31;SVLEN=0;SVTYPE=SNP GT:COV:GT_CONF 1/1:0,209:121.99 /data2/users/phelim/ana/staph/cortex/results/C00001085/vcfs/C00001085_wk_flow_I_RefCC_FINALcombined_BC_calls_at_all_k.raw.vcf R00000022 616858 UNION_BC_k31_var_9908 AGAT AAAC . PASS KMER=31;SVLEN=0;SVTYPE=PH_SNPS GT:COV:GT_CONF 0/1:21,125:5.28 [correct ref allele is T]

cortex/results/nctc/NCTC5655/vcfs/NCTC5655_wk_flow_I_RefCC_FINALcombined_BC_calls_at_all_k.raw.vcf R00000022 1356994 UNION_BC_k31_var_778 G GA . PASS KMER=31;PV=6;SVLEN=1;SVTYPE=INS GT:COV:GT_CONF ./.:0,0:0.50 cortex/results/nctc/NCTC7972/vcfs/NCTC7972_wk_flow_I_RefCC_FINALcombined_BC_calls_at_all_k.raw.vcf R00000022 1356994 UNION_BC_k31_var_8861 T TA . PASS KMER=31;PV=6;SVLEN=1;SVTYPE=INS GT:COV:GT_CONF 0/1:7,1:3.27 [ the correct ref allele is T]

Perhaps need a script to filter sites at same location with different ref allele prefixes at the same time as running the scripts to remove duplicates and label overlaps.

rmcolq avatar Feb 10 '16 15:02 rmcolq