nano-snakemake
nano-snakemake copied to clipboard
Problem in the sort_vcf rule
Hi Wouter.
This may not necessarily be a problem with the pipeline, but maybe you can give me a hand with it.
Most of the vcf files can be sorted without problems, but I'm getting an error with one that comes from "pbsv_combined". You can find the file here: https://file.io/FfYklD
The error I'm getting is this:
[E::vcf_parse_format] Incorrect number of FORMAT fields at GL000208.1:1
It's directly reproducible by running
bcftools sort genotypes.vcf
Any tips on what's going on would be greatly appreciated.
I think SURVIVOR is to blame here (tagging @fritzsedlazeck). In the FORMAT field of a BND/TRA position you get:
GT:PSV:LN:DR:ST:QV:TY:ID:RAL:AAL:CO
0/1:NA:47079723:0,0:++:.:TRA:pbsv.BND.GL000208.1:1-chr5:47079724:NA:NA:GL000208.1_1-chr5_47079724
Note that the ID
part of the FORMAT field also contains :
for the coordinates, which is actually the delimiter in the FORMAT field.
You can take a look at this in your own data with for example things like this:
cat genotypes.vcf | grep -v '^#' | grep GL000208.1 | head -n 1 | cut -f9,12 | tr '\t' '\n' | tr ':' '\t' | column -ts $'\t' | less -S
Right, I see it now. Let's see what Fritz thinks about it. Thanks!