genotyping missing as ref
Dear Jared
I'm genotyping some fungi with nanopolish variants using a vcf file with known variants as --genotypes:
python /path/to/software/nanopolish-0.9.2/nanopolish/scripts/nanopolish_makerange.py --overlap-length 0 /path/to/reference.fasta | parallel --results sample_nanopolish.results -P 3 nanopolish variants -m 0.8 -d 10 --genotype /path/to/SNPs_nanopore_merged.vcf -o sample_nanopolish_final_{1}.vcf -w {1} -p 1 -r /path/to/reads/sample_albacore_trimmed.fastq -b /path/to/alignments/sample_albacore_to_ref_bwa-mem_sorted.bam -g /path/to/reference.fasta -t 6
and I'd expect to get GT being 0 for the site being equal to the reference, 1 if it's the alternative allele and . if it's missing due to no alignment, for example. The fungus in question frequently loses chromosomes or parts thereof and also some chromosomes are rearranged heavily and have bits added and missing, so there are lots of regions which won't have any alignments.
The calls which should be missing do come out as 0 though, which is wrong, like in the example below which straddles a gap in the alignment:
`chr1.1 62654 . G A 0.0 PASS TotalReads=2;AlleleCount=0;SupportFraction=0.531566 GT 0 chr1.1 62657 . T C 0.0 PASS TotalReads=2;AlleleCount=0;SupportFraction=0.488378 GT 0 chr1.1 62724 . C G 0.0 PASS TotalReads=0;AlleleCount=0;SupportFraction=-nan GT 0 chr1.1 62758 . G A 0.0 PASS TotalReads=0;AlleleCount=0;SupportFraction=-nan GT 0 chr1.11 62790 . C T 0.0 PASS TotalReads=0;AlleleCount=0;SupportFraction=-nan GT 0
(...)
chr1.1 79986 . G A 0.0 PASS TotalReads=0;AlleleCount=0;SupportFraction=-nan GT 0 chr1.1 80030 . T C 0.0 PASS TotalReads=0;AlleleCount=0;SupportFraction=-nan GT 0 chr1.1 80131 . C T 0.0 PASS TotalReads=0;AlleleCount=0;SupportFraction=-nan GT 0 chr1.1 80328 . T C 0.0 PASS TotalReads=1;AlleleCount=0;SupportFraction=0.996983 GT 0 chr1.1 81011 . T C 20.5 PASS TotalReads=2;AlleleCount=1;SupportFraction=0.999939 GT 1 chr1.1 81138 . A G 0.0 PASS TotalReads=2;AlleleCount=0;SupportFraction=0.962576 GT 0 ` Is there any way in which I can amend that behaviour?
Kind regards,
A. Webb
Hi,
Thanks for reporting this. I'm on vacation for the next 10 days but I'll fix this when I get back. I will probably just output a genotype of "." (missing) when TotalReads=0.
Jared
Hi Jared
Thanks a lot for your fast reply! I've got two more suggestions for nanopolish variants:
- adding the read group ID from the bam file (where present) rather than "sample" to the resulting vcf file would be very useful
- throwing an error if the file after the --genotype parameter doesn't exist instead of printing empty output
Kind regards,
A. Webb
Hi Jared
has that issue been addressed in your new release?
Kind regards A.Webb
Hi,
Sorry but it hasn't. I'm travelling again now so it will take a little more time.
Jared
Commit 5a4939d will emit . for the genotype when TotalReads is 0
The --genotype argument appears to be handled properly:
variants: option '--genotype' requires an argument
I'd like to add full support for read groups in the future but I can't do it right now. I'm going to leave the genotype column as "sample" for now.