medaka icon indicating copy to clipboard operation
medaka copied to clipboard

Duplicate entries in annotated VCF file

Open tfwulff opened this issue 5 months ago • 1 comments

Describe the bug Hi! I performed bacterial variant calling using ` medaka_haploid_variant `: ``` medaka_haploid_variant -t 10 -m r1041_e82_400bps_sup_v4.3.0 -i <FASTQ> -r <REFERENCE> -o <OUTDIR> ``` The resulting medaka.annotated.vcf file contains duplicate variant entries which are not present in medaka.sorted.vcf.

Logging According to the stderr (relevant parts added below), the Annotate function seems to run twice on one region of the genome. All duplicate variant entries in the medaka.annotated.vcf are from this region (in this case between 1,312,608 and 1,519,097):

``` [12:03:48 - Annotate] Getting chrom coordinates [12:03:48 - Annotate] Processing chunk with coordinates: contig1:19097-519097 [12:03:48 - Annotate] Processing chunk with coordinates: contig1:519097-1019097 [12:03:48 - Annotate] Processing chunk with coordinates: contig1:1019097-1519097 [12:03:48 - Annotate] Processing chunk with coordinates: contig1:1312608-1812608 ```

Environment

  • Installation via conda
  • OS: Debian GNU/Linux 11 (bullseye)
  • medaka version 1.11.3
  • No GPU

tfwulff avatar Jan 26 '24 11:01 tfwulff

Hi @tfwulff,

Would it be possible for you to share you inputs in order to investigate further? This is not something we've ever observed.

cjw85 avatar Mar 05 '24 15:03 cjw85