breakseq2
breakseq2 copied to clipboard
VCF coordinates appear to be 0-based instead of 1-based
The VCF output file appears to contain 0-based coordinates. As per the specification, VCF files should contain 1-based coordinates (https://samtools.github.io/hts-specs/VCFv4.1.pdf).
(FYI: VCF also specify that "telomeres are indicated by using positions 0 or N+1, where N is the length of the corresponding chromosome or contig.")
In the below example, the event in the VCF file should have position 16376517 instead of 16376516.
breakseq.vcf.gz:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT PB0418-N
1 16376516 . C <DEL> . LowQual SVLEN=-9741;SVTYPE=DEL;END=16386257 GT:ABC:PE:REFCOUNTS 1/1:0,0,1:1:0,0
breakseq.out:
1 1KG_Phase1 Deletion 16376517 16386257 1 . C NAME C1WLGACXX130310:1:1301:1852:84776$1;MAPQ 37;CIGAR 76M;POS 61;END 137;LIBLEN 200;LSPAN 39;RSPAN 37;PE Y;SVLEN 9741
breakseq_genotyped.gff:
1 BreakSeq Deletion 16376517 16386257 1 . . QUAL LowQual;ABC 0,0,1;PE 1;SVLEN 9741;GT 1/1;COUNTS 0,0
The breakpoint example above is annotated in the library as:
$ grep 163765 breakseq2_bplib_20150129.gff
1 1KG_Phase1 Deletion 16376517 16386257 . . .
I am assuming that the breakpoint GFF library contains 1-based coordinate, as per the specification (http://useast.ensembl.org/info/website/upload/gff.html).
Invocation:
run_breakseq2.py \
--reference /seq/references/Homo_sapiens_assembly19/v1/Homo_sapiens_assembly19.fasta \
--work work \
--bwa bwa-0.7.12/bwa \
--samtools samtools-0.1.19/samtools \
--bplib_gff bplib/breakseq2_bplib_20150129.gff \
--nthreads 4 \
--sample PB0418-N \
--bams /seq/picard_aggregation/C932/PB0418-N/v7/PB0418-N.bam \
The run log reports success on the last line:
INFO 2016-08-17 10:31:56,875 breakseq2_workflow Done! (342.241 s)