graphtyper
graphtyper copied to clipboard
One region in Trio has very slow genotyping
Using a large library of breakpoints, a job was submitted on the GIAB HG002 sample for each chromosome using 8 threads. Most chromosomes completed in under 8 hours except chromosome 4 in one region. Chr 4 is still running after 2 more days. This also occurred with the two parents.
Any suggestions on this?
-rw-r--r-- 1 farrell casa 726 Jul 7 01:20 workarea/HG002/chr4/048000001-049000000.vcf.gz.tbi
-rw-r--r-- 1 farrell casa 2.8M Jul 7 01:20 workarea/HG002/chr4/048000001-049000000.vcf.gz
-rw-r--r-- 1 farrell casa 664 Jul 5 14:56 workarea/HG002/chr4/047000001-048000000.vcf.gz.tbi
-rw-r--r-- 1 farrell casa 2.0M Jul 5 14:56 workarea/HG002/chr4/047000001-048000000.vcf.gz
Hi,
I also occurred this problem before, all chromosmes finished genotyping except one chromosome. I tried to split the variants and ran graphtyper one by one. Maybe you can try. Also, you can check the read depth within this region using samtools tview
.
Best, Zhuqing
What type of site should I look for that may cause the high CPU? There is quite a few variants in the region.
Is is high DP,# of multiallelic at site, large size of SV?
I would think this is happening in regions that have very high alignment depth. In earlier versions of graphtyper SV genotyping, we had a high coverage downsampling filter but later found out it was having a bad effect on quality so we turned it off.
I realize this is problematic so I will experiment if we can re-enable the filter but make it less aggressive than before.
Best, Hannes
@hannespetur Has there been any progress on testing a downsampling filter that is less aggressive?
I see that issue #58 has described the downsampling filter being tested using --avg_cov_by_readlen.
The --avg_cov_by_readlen
option to subsample reads has been added to graphtyper v2.6.1.