ASTRAL icon indicating copy to clipboard operation
ASTRAL copied to clipboard

Bootstrap analyses large data memory error

Open Ptero64 opened this issue 3 years ago • 2 comments

Hello, Iam trying to run bootstrap analyses with astral on a large dataset (~14000 loci). Unfortunatly the run failed with error message from java which seems related to memory issue?

To Reproduce Here is the command used:

#Multi-locus bootstrapping (MLBS) (use 1000 uboot2 output from iqtree)

java -jar astral.5.7.7.jar -i ./Input_unrooted_tree/InputGenesTree_NoColap.trees -b bootstrap_genetrees_path.txt -r 1000 -s 1984 -o Results/Astral_MLBS_1000.tre 2>out_Astral_MLBS_1000.log

#version with Gene+Site resampling

java -jar astral.5.7.7.jar -i ./Input_unrooted_tree/InputGenesTree_NoColap.trees -b bootstrap_genetrees_path.txt -g -r 500 -s 1984 -o Results/Astral_MLBS_GeneSite_500.tre 2>out_Astral_MLBS_GeneSite_500.log

Log file And the log file of out_Astral_MLBS_1000.log: ================== ASTRAL =====================

This is ASTRAL version 5.7.7 Gene trees are treated as unrooted 13388 trees read from ./Input_unrooted_tree/InputGenesTree_NoColap.trees Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:3332) at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124) at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:596) at java.lang.StringBuffer.append(StringBuffer.java:367) at java.io.BufferedReader.readLine(BufferedReader.java:358) at java.io.BufferedReader.readLine(BufferedReader.java:389) at phylonet.coalescent.CommandLine.readTreeFileAsString(CommandLine.java:728) at phylonet.coalescent.CommandLine.readOptions(CommandLine.java:374) at phylonet.coalescent.CommandLine.main(CommandLine.java:485)

log from out_Astral_MLBS_GeneSite_1000.log ================== ASTRAL =====================

This is ASTRAL version 5.7.7 Gene trees are treated as unrooted 13388 trees read from ./Input_unrooted_tree/InputGenesTree_NoColap.trees Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:3332) at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124) at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:541) at java.lang.StringBuffer.append(StringBuffer.java:350) at java.util.regex.Matcher.appendReplacement(Matcher.java:888) at java.util.regex.Matcher.replaceAll(Matcher.java:955) at java.lang.String.replaceAll(String.java:2223) at phylonet.coalescent.CommandLine.readTreeFileAsString(CommandLine.java:730) at phylonet.coalescent.CommandLine.readOptions(CommandLine.java:374) at phylonet.coalescent.CommandLine.main(CommandLine.java:485)

** Version astral 5.7.7 Additional context I try to run it on a hpc requesting 10 cores x 50G memory (high memory nodes). Input bootstrap tree are from iqtree2 (ufboot). Astral analyses (LPP) using the same input worked correctly. Add any other context about the problem here.

Thank you in advance for the help,

regards nicolas

Ptero64 avatar Jun 07 '21 09:06 Ptero64

I tried asking for 100 replicates and this time I have this error message:

================== ASTRAL =====================

This is ASTRAL version 5.7.7 Gene trees are treated as unrooted 13388 trees read from ./Input_unrooted_tree/InputGenesTree_NoColap.trees Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.Arrays.copyOf(Arrays.java:3332) at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124) at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:541) at java.lang.StringBuffer.append(StringBuffer.java:350) at java.util.regex.Matcher.appendReplacement(Matcher.java:888) at java.util.regex.Matcher.replaceAll(Matcher.java:955) at java.lang.String.replaceAll(String.java:2223) at phylonet.coalescent.CommandLine.readTreeFileAsString(CommandLine.java:730) at phylonet.coalescent.CommandLine.readOptions(CommandLine.java:374) at phylonet.coalescent.CommandLine.main(CommandLine.java:485)

Ptero64 avatar Jun 07 '21 10:06 Ptero64

The issue is lack of memory. ASTRAL bootstrapping is a bit inefficient with memory. Two solutions come to mind.

  • If you have 50GB on your machine, you can add the -Xmx option. For example, you can run
java -Xmx47g -jar astral.5.7.7.jar -i ./Input_unrooted_tree/InputGenesTree_NoColap.trees -b bootstrap_genetrees_path.txt -g -r 100 -s 1984 -o Results/Astral_MLBS_GeneSite_500.tre 2>out_Astral_MLBS_GeneSite_500.log

This will tell Java that it can use up to 47G of memory. If you are running other things on that machine, you may want to reduce that a bit.

  • ASTRAL bootstrapping is not anything other than running ASTRAL 101 or 1001 times. If you just manually run ASTRAL those many times, it will work just fine. To do that, you would need to:
  1. Create the 100 or 1000 bootstrap replicate inputs to astral. ASTRAL can do that using -k bootstraps_norun option. So you would run
java -Xmx47g -jar astral.5.7.7.jar -i ./Input_unrooted_tree/InputGenesTree_NoColap.trees -b bootstrap_genetrees_path.txt -g -r 100 -s 1984 -o Results/MLBS-reps -k bootstraps_norun 2>out_Astral_MLBS_GeneSite_500.log

This will produce files like Results/MLBS-reps.35.bs.

  1. Then, you will run ASTRAL on each of these files separately.
  2. You also run ASTRAL on main input files with no bootstrapping
  3. Draw bipartition support onto the ASTRAL tree using the collection of ASTRAL bootstrap replicate trees. Many tools can do this. My favorite is RAxML's -f b option.

Hope one of these two solutions help.

smirarab avatar Jun 23 '21 14:06 smirarab