bedtools
bedtools copied to clipboard
VCF output truncated, if blank line follows header
Hi Aaron -
I have some VCF files with a blank line after the #CHROM...
header line. BEDTools programs operating on these VCFs only report 2 columns (chrom, pos) rather than the entire line as expected.
Below is some troubleshooting that reproduces the problem. They're not full-spec VCFs with the whole header, but they do isolate the problem. I finally narrowed it down: truncation only happens when there is a "##", followed by some text on the same line, and a blank line before the data.
That is, here only a2.vcf
and a6.vcf
show the truncation. The others, (including a3.vcf
which has "##", no following characters, and a blank line) are fine.
Knowing this, I can now fix my files to get them to work, but I don't see anything in the VCF spec that disallows blank lines like that -- hence the bug report.
$ cat b.bed
chr10 3100000 3200000
for i in 0 1 2 3 4 5 6
do
fn=a$i.vcf
echo; echo "$fn original:"
cat $fn
echo; echo "intersected:"
bedtools intersect -a $fn -b b.bed; echo;
done
a0.vcf original:
chr10 3127949 T A 46.3 . FQ=-28.7;DP4=0,0,4,1;AC1=6.0;VDB=0.08892796;MQ=20;AF1=1.0;DP=5 GT:PL:GQ 1/1:0,0,0:4 1/1:0,0,0:4 1/1:79,15,0:17
intersected:
chr10 3127949 T A 46.3 . FQ=-28.7;DP4=0,0,4,1;AC1=6.0;VDB=0.08892796;MQ=20;AF1=1.0;DP=5 GT:PL:GQ 1/1:0,0,0:4 1/1:0,0,0:4 1/1:79,15,0:17
a1.vcf original:
##fileformat=VCFv4.1
chr10 3127949 T A 46.3 . FQ=-28.7;DP4=0,0,4,1;AC1=6.0;VDB=0.08892796;MQ=20;AF1=1.0;DP=5 GT:PL:GQ 1/1:0,0,0:4 1/1:0,0,0:4 1/1:79,15,0:17
intersected:
chr10 3127949 T A 46.3 . FQ=-28.7;DP4=0,0,4,1;AC1=6.0;VDB=0.08892796;MQ=20;AF1=1.0;DP=5 GT:PL:GQ 1/1:0,0,0:4 1/1:0,0,0:4 1/1:79,15,0:17
a2.vcf original:
##fileformat=VCFv4.1
chr10 3127949 T A 46.3 . FQ=-28.7;DP4=0,0,4,1;AC1=6.0;VDB=0.08892796;MQ=20;AF1=1.0;DP=5 GT:PL:GQ 1/1:0,0,0:4 1/1:0,0,0:4 1/1:79,15,0:17
intersected:
chr10 3127949
a3.vcf original:
##
chr10 3127949 T A 46.3 . FQ=-28.7;DP4=0,0,4,1;AC1=6.0;VDB=0.08892796;MQ=20;AF1=1.0;DP=5 GT:PL:GQ 1/1:0,0,0:4 1/1:0,0,0:4 1/1:79,15,0:17
intersected:
chr10 3127949 T A 46.3 . FQ=-28.7;DP4=0,0,4,1;AC1=6.0;VDB=0.08892796;MQ=20;AF1=1.0;DP=5 GT:PL:GQ 1/1:0,0,0:4 1/1:0,0,0:4 1/1:79,15,0:17
a4.vcf original:
#comment
chr10 3127949 T A 46.3 . FQ=-28.7;DP4=0,0,4,1;AC1=6.0;VDB=0.08892796;MQ=20;AF1=1.0;DP=5 GT:PL:GQ 1/1:0,0,0:4 1/1:0,0,0:4 1/1:79,15,0:17
intersected:
chr10 3127949 T A 46.3 . FQ=-28.7;DP4=0,0,4,1;AC1=6.0;VDB=0.08892796;MQ=20;AF1=1.0;DP=5 GT:PL:GQ 1/1:0,0,0:4 1/1:0,0,0:4 1/1:79,15,0:17
a5.vcf original:
##fileformat=VCFv4.1
#CHROM
chr10 3127949 T A 46.3 . FQ=-28.7;DP4=0,0,4,1;AC1=6.0;VDB=0.08892796;MQ=20;AF1=1.0;DP=5 GT:PL:GQ 1/1:0,0,0:4 1/1:0,0,0:4 1/1:79,15,0:17
intersected:
chr10 3127949 T A 46.3 . FQ=-28.7;DP4=0,0,4,1;AC1=6.0;VDB=0.08892796;MQ=20;AF1=1.0;DP=5 GT:PL:GQ 1/1:0,0,0:4 1/1:0,0,0:4 1/1:79,15,0:17
a6.vcf original:
##fileformat=VCFv4.1
#CHROM
chr10 3127949 T A 46.3 . FQ=-28.7;DP4=0,0,4,1;AC1=6.0;VDB=0.08892796;MQ=20;AF1=1.0;DP=5 GT:PL:GQ 1/1:0,0,0:4 1/1:0,0,0:4 1/1:79,15,0:17
intersected:
chr10 3127949
Thanks for reporting this Ryan. It seems odd to me that blank line would be allowed in VCF. I will have a look at this soon, but am travelling to a meeting this week, so it might be a bit...
Thanks. No worries, take your time -- it's easy enough to ensure no blank lines in VCFs used as input.