HLA-TAPAS
HLA-TAPAS copied to clipboard
Error: "Exception in thread "main" java.lang.StackOverflowError"
Hi @WansonChoi , I get an error when I run “SNP2HLA”. Can you help me see what's wrong? (There are 14390 samples)
My overall commmand is like this
########################################
python -m SNP2HLA \ --target inputDir/inputfile \ --out outputDir/outputfile \ --reference HLA-TAPASDir/HLA-TAPAS/resources/1000G.bglv4 \ --nthreads 40 \ --mem 160g
########################################
This is log file,the error is as follows: #################################################################### beagle.27Jun16.b16.jar (version 4.1) Copyright (C) 2014-2015 Brian L. Browning Enter "java -jar beagle.27Jun16.b16.jar" for a summary of command line arguments. Start time: 05:27 PM CST on 26 Oct 2022
Command line: java -Xmx145636m -jar beagle.jar gt=outputDir/.MHC.QC.bgl.vcf ref=refDir/HLA-TAPAS/resources/1000G.bglv4.bgl.phased.vcf.gz impute=true gprobs=true nthreads=40 chrom=6 niterations=5 lowmem=true out=outputDir/.bgl.phased
No genetic map is specified: using 1 cM = 1 Mb
reference samples: 2504 target samples: 14390
Window 1 [ 6:27967817-32374131 ] reference markers: 50000 target markers: 20340
Starting burn-in iterations
Window=1 Iteration=1 Time for building model: 44 minutes 51 seconds Time for sampling (singles): 3 hours 21 minutes 24 seconds DAG statistics mean edges/level: 428 max edges/level: 725 mean edges/node: 1.064 mean count/edge: 79
Window=1 Iteration=2 Time for building model: 3 hours 31 minutes 51 seconds Time for sampling (singles): 5 hours 5 seconds DAG statistics mean edges/level: 685 max edges/level: 1368 mean edges/node: 1.041 mean count/edge: 49 Exception in thread "main" java.lang.StackOverflowError at dag.MergeableDag.similar(MergeableDag.java:374) at dag.MergeableDag.similar(MergeableDag.java:374) at dag.MergeableDag.similar(MergeableDag.java:374) ######################################################
@kaqisekuzi
Hi, kaqisekuzi. Thank you for your interest in HLA-TAPAS.
Could you try it again by adding "-Xss1g" after the "-Xmx"+_mem
element of the BEAGLE
variable? (https://github.com/immunogenomics/HLA-TAPAS/blob/a2bd506b9968611f571329c3978af4e939dc3254/SNP2HLA/SNP2HLA.py#L135)
Your log file shows "Exception in thread "main" java.lang.StackOverflowError" and this 'StackOverflowError' has something to do with not enough stack memory allocation. (https://faculty.washington.edu/browning/beagle/beagle_4.1_21Jan17.pdf - Refer to the "2 Command line arguments" section)
Because 14390 samples are quite big, If the same StackOverflowError happens again, then try increasing the amount of stack memory.
@WansonChoi
Hi WansonChoi . Thank you for your reply.
I try it again by adding "-Xss1g" after the "-Xmx"+_mem element.
> BEAGLE = ' '.join(["java", "-Djava.io.tmpdir="+JAVATMP, "-Xmx"+_mem,"-Xss3g","-jar", _beagle])
But there is a new error:
Error occurred during initialization of VM java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:717) at java.lang.ref.Reference.<clinit>(Reference.java:232)
Can you help me see what's wrong?
@kaqisekuzi
The "OutOfMemoryError" is probably related to the heap memory size which you allocated with the '--mem' argument.
Maybe you should allocate more memory size than 160g to impute 14,390 samples.
If memory>160g exceeds the maximum memory size of your system, then you have to divide samples and run the SNP2HLA several times. (e.g. 5000+5000+4390 => 3 runs of the SNP2HLA)
@WansonChoi
Hi WansonChoi . Thank you for your reply.
The last setting was "-Xss3g",
BEAGLE = ' '.join(["java", "-Djava.io.tmpdir="+JAVATMP, "-Xmx"+_mem,"-Xss3g","-jar", _beagle])
this time I tried it with "-Xss1g", and at the same time, the samples was divided into three parts.
BEAGLE = ' '.join(["java", "-Djava.io.tmpdir="+JAVATMP, "-Xmx"+_mem,"-Xss1g","-jar", _beagle])
Now it works ! Thanks for your software and work!!!
@WansonChoi
Hi WansonChoi . I have a new question.
Following your suggestion, I divided 14390 samples into multiple copies to run and generated multiple vcf files. However, the eighth column "INFO" information of the same site in different VCF files is different.
###################################################################################
a.vcf.file
#CHROM POS ID REF ALT QUAL FILTER INFO
6 27967817 rs146746088 A G . PASS AR2=0.45;DR2=0.46;AF=0.0055;IMP
6 27968350 rs78094982 G A . PASS AR2=0.72;DR2=0.72;AF=0.0069;IMP
###################################################################################
###################################################################################
b.vcf.file
#CHROM POS ID REF ALT QUAL FILTER INFO
6 27967817 rs146746088 A G . PASS AR2=0.50;DR2=0.50;AF=0.0061;IMP
6 27968350 rs78094982 G A . PASS AR2=0.65;DR2=0.65;AF=0.0057;IMP
###################################################################################
Will "INFO" be used in subsequent correlation analysis? If "INFO" does not affect subsequent analysis, then I ignore this column when merging vcf files. If "INFO" affects subsequent analysis, can 14390 samples not be divided into multiple parts to run?
@kaqisekuzi
I thought you were mainly interested in imputing HLA type.
Why don't you try the HLA-TAPAS in the Michigan imputation server?. It might provide enough computing resource that can run 14390 samples at once.
@WansonChoi
Hi WansonChoi . Thank you for your reply. I reconfigured the running memory and the problem has been solved.