graphtyper icon indicating copy to clipboard operation
graphtyper copied to clipboard

One region in Trio has very slow genotyping

Open jjfarrell opened this issue 3 years ago • 6 comments

Using a large library of breakpoints, a job was submitted on the GIAB HG002 sample for each chromosome using 8 threads. Most chromosomes completed in under 8 hours except chromosome 4 in one region. Chr 4 is still running after 2 more days. This also occurred with the two parents.

Any suggestions on this?

-rw-r--r-- 1 farrell casa  726 Jul  7 01:20 workarea/HG002/chr4/048000001-049000000.vcf.gz.tbi
-rw-r--r-- 1 farrell casa 2.8M Jul  7 01:20 workarea/HG002/chr4/048000001-049000000.vcf.gz

-rw-r--r-- 1 farrell casa  664 Jul  5 14:56 workarea/HG002/chr4/047000001-048000000.vcf.gz.tbi
-rw-r--r-- 1 farrell casa 2.0M Jul  5 14:56 workarea/HG002/chr4/047000001-048000000.vcf.gz

jjfarrell avatar Jul 08 '20 17:07 jjfarrell

Hi,

I also occurred this problem before, all chromosmes finished genotyping except one chromosome. I tried to split the variants and ran graphtyper one by one. Maybe you can try. Also, you can check the read depth within this region using samtools tview.

Best, Zhuqing

biozzq avatar Jul 10 '20 01:07 biozzq

What type of site should I look for that may cause the high CPU? There is quite a few variants in the region.

Is is high DP,# of multiallelic at site, large size of SV?

jjfarrell avatar Jul 11 '20 15:07 jjfarrell

I would think this is happening in regions that have very high alignment depth. In earlier versions of graphtyper SV genotyping, we had a high coverage downsampling filter but later found out it was having a bad effect on quality so we turned it off.

I realize this is problematic so I will experiment if we can re-enable the filter but make it less aggressive than before.

Best, Hannes

hannespetur avatar Jul 15 '20 10:07 hannespetur

@hannespetur Has there been any progress on testing a downsampling filter that is less aggressive?

jjfarrell avatar Oct 15 '20 16:10 jjfarrell

I see that issue #58 has described the downsampling filter being tested using --avg_cov_by_readlen.

jjfarrell avatar Oct 17 '20 17:10 jjfarrell

The --avg_cov_by_readlen option to subsample reads has been added to graphtyper v2.6.1.

hannespetur avatar Jan 08 '21 09:01 hannespetur