bedtools icon indicating copy to clipboard operation
bedtools copied to clipboard

VCF output truncated, if blank line follows header

Open daler opened this issue 10 years ago • 2 comments

Hi Aaron -

I have some VCF files with a blank line after the #CHROM... header line. BEDTools programs operating on these VCFs only report 2 columns (chrom, pos) rather than the entire line as expected.

Below is some troubleshooting that reproduces the problem. They're not full-spec VCFs with the whole header, but they do isolate the problem. I finally narrowed it down: truncation only happens when there is a "##", followed by some text on the same line, and a blank line before the data.

That is, here only a2.vcf and a6.vcf show the truncation. The others, (including a3.vcf which has "##", no following characters, and a blank line) are fine.

Knowing this, I can now fix my files to get them to work, but I don't see anything in the VCF spec that disallows blank lines like that -- hence the bug report.

$ cat b.bed
chr10   3100000 3200000
for i in 0 1 2 3 4 5 6
do
 fn=a$i.vcf
 echo; echo "$fn original:"
 cat $fn
 echo; echo "intersected:"
 bedtools intersect -a $fn -b b.bed; echo;
done
a0.vcf original:
chr10   3127949 T   A   46.3    .   FQ=-28.7;DP4=0,0,4,1;AC1=6.0;VDB=0.08892796;MQ=20;AF1=1.0;DP=5  GT:PL:GQ    1/1:0,0,0:4 1/1:0,0,0:4 1/1:79,15,0:17

intersected:
chr10   3127949 T   A   46.3    .   FQ=-28.7;DP4=0,0,4,1;AC1=6.0;VDB=0.08892796;MQ=20;AF1=1.0;DP=5  GT:PL:GQ    1/1:0,0,0:4 1/1:0,0,0:4 1/1:79,15,0:17


a1.vcf original:
##fileformat=VCFv4.1
chr10   3127949 T   A   46.3    .   FQ=-28.7;DP4=0,0,4,1;AC1=6.0;VDB=0.08892796;MQ=20;AF1=1.0;DP=5  GT:PL:GQ    1/1:0,0,0:4 1/1:0,0,0:4 1/1:79,15,0:17

intersected:
chr10   3127949 T   A   46.3    .   FQ=-28.7;DP4=0,0,4,1;AC1=6.0;VDB=0.08892796;MQ=20;AF1=1.0;DP=5  GT:PL:GQ    1/1:0,0,0:4 1/1:0,0,0:4 1/1:79,15,0:17


a2.vcf original:
##fileformat=VCFv4.1

chr10   3127949 T   A   46.3    .   FQ=-28.7;DP4=0,0,4,1;AC1=6.0;VDB=0.08892796;MQ=20;AF1=1.0;DP=5  GT:PL:GQ    1/1:0,0,0:4 1/1:0,0,0:4 1/1:79,15,0:17

intersected:
chr10   3127949


a3.vcf original:
##

chr10   3127949 T   A   46.3    .   FQ=-28.7;DP4=0,0,4,1;AC1=6.0;VDB=0.08892796;MQ=20;AF1=1.0;DP=5  GT:PL:GQ    1/1:0,0,0:4 1/1:0,0,0:4 1/1:79,15,0:17

intersected:
chr10   3127949 T   A   46.3    .   FQ=-28.7;DP4=0,0,4,1;AC1=6.0;VDB=0.08892796;MQ=20;AF1=1.0;DP=5  GT:PL:GQ    1/1:0,0,0:4 1/1:0,0,0:4 1/1:79,15,0:17


a4.vcf original:
#comment

chr10   3127949 T   A   46.3    .   FQ=-28.7;DP4=0,0,4,1;AC1=6.0;VDB=0.08892796;MQ=20;AF1=1.0;DP=5  GT:PL:GQ    1/1:0,0,0:4 1/1:0,0,0:4 1/1:79,15,0:17

intersected:
chr10   3127949 T   A   46.3    .   FQ=-28.7;DP4=0,0,4,1;AC1=6.0;VDB=0.08892796;MQ=20;AF1=1.0;DP=5  GT:PL:GQ    1/1:0,0,0:4 1/1:0,0,0:4 1/1:79,15,0:17


a5.vcf original:
##fileformat=VCFv4.1
#CHROM
chr10   3127949 T   A   46.3    .   FQ=-28.7;DP4=0,0,4,1;AC1=6.0;VDB=0.08892796;MQ=20;AF1=1.0;DP=5  GT:PL:GQ    1/1:0,0,0:4 1/1:0,0,0:4 1/1:79,15,0:17

intersected:
chr10   3127949 T   A   46.3    .   FQ=-28.7;DP4=0,0,4,1;AC1=6.0;VDB=0.08892796;MQ=20;AF1=1.0;DP=5  GT:PL:GQ    1/1:0,0,0:4 1/1:0,0,0:4 1/1:79,15,0:17


a6.vcf original:
##fileformat=VCFv4.1
#CHROM

chr10   3127949 T   A   46.3    .   FQ=-28.7;DP4=0,0,4,1;AC1=6.0;VDB=0.08892796;MQ=20;AF1=1.0;DP=5  GT:PL:GQ    1/1:0,0,0:4 1/1:0,0,0:4 1/1:79,15,0:17

intersected:
chr10   3127949

daler avatar Oct 26 '13 00:10 daler

Thanks for reporting this Ryan. It seems odd to me that blank line would be allowed in VCF. I will have a look at this soon, but am travelling to a meeting this week, so it might be a bit...

arq5x avatar Oct 28 '13 18:10 arq5x

Thanks. No worries, take your time -- it's easy enough to ensure no blank lines in VCFs used as input.

daler avatar Oct 28 '13 18:10 daler