gatk icon indicating copy to clipboard operation
gatk copied to clipboard

java.lang.IllegalArgumentException: the number of genotypes is too large for ploidy 8 and 55 alleles: approx. 3381098545

Open ChenDepp opened this issue 1 year ago • 3 comments
trafficstars

bug reports

hi guys :

Hi all, when I run gatk (version: 4.5.0.0) CombineGVCFs to combine 240 8 ploidy samples gvcf, it reports the error as below image

how call i solve it? ,replace CombineGVCFs with GenomicsDBimport ? I think even though I got the merged gvcf file , this error is also will be reported when I run GenotypeGVCF? I look forward to your suggestions have a good day!

ChenDepp avatar May 22 '24 15:05 ChenDepp

GenomicsDBImport is definitely the way to go for this kind of operation. On the other hand STRs are quite prone to errors especially when higher ploidies are involved. You may wish to reduce them or even completely drop them if they are not of your interest.

gokalpcelik avatar May 22 '24 15:05 gokalpcelik

hi @gokalpcelik I used GenomicsDBImport to replace CombinedGVCFs, but it has new problems, GenotypeGVCFs for GenomicsDB is so slow, can only get 900K interval vcf in 9 hours. how can i speed it up. waiting for your reply. hava a good day!

ChenDepp avatar May 28 '24 13:05 ChenDepp

Hi again. You should be able to split your variants into multiple intervals and import all intervals in parallel under different genomicsDB import instances. Those instances can then be genotyped in parallel and finally combined into a single callset. By this way you can get your variants faster. This method is called scatter-gather which is what we do and suggest.

I hope this helps. Regards.

gokalpcelik avatar Jun 03 '24 19:06 gokalpcelik