GADMA
GADMA copied to clipboard
GADMA slow for 5 population model
Hello,
I'm attempting to run a 5 population model using the Moments engine with SMAC_BO_combination Bayesian optimization. GADMA has been running for 50 hours but still reports No models yet. I'm running this on a cluster compute node with 8 cores and 640 GB of RAM.
Is this long runtime normal for a larger model? Or is there something I can do to help speed things up?
Custom model file:
import moments
import numpy as np
def model_func(params, ns):
"""
Five population demographic history for the species complex
populations:
A - Acin Elk
B - Acin upper Tennessee
C - Aana
D - Amay Cumberland basin
E - Amay Big South Fork
All pops have constant population size.
Order of splits in this model are based on prior phylogenetic results
Ancestral population was split (t1 + t2) time ago into two new
populations: AB and CDE. Next, CDE population was split into C and DE.
Then AB then split into two populations, A and B. Finally, DE splits
into two populations, D and E. There is symmetrical migration between pops A, B,
C, D, and E after the final split. But there is no migration between
ancestral populations.
:param nAB: Size of ancestral population AB after earliest split
:param nDCE: Size of ancestral population DCE after earliest split
:param nA: Size of extant population A
:param nB: Size of extant population B
:param nC: Size of extant population C
:param nDE: Size of ancestral population DE
:param nD: Size of extant population D
:param nE: Size of extant population E
:params mAB, mAC, mAD, mAE, mBC, mBD, mBE, mCD, mCE, mDE: migration rates between each pair of extant populations
:param t1: Time between ancestral population split to C - DE split
:param t2: Time between C - DE split and A - B split
:param t3: Time between A - B split and D - E split
:param t4: Time between D - E split to the present
"""
nAB, nDCE, nA, nB, nC, nDE, nD, nE, mAB, mAC, mAD, mAE, mBC, mBD, mBE, mCD, mCE, mDE, t1, t2, t3, t4 = params
sts = moments.LinearSystem_1D.steady_state_1D(sum(ns))
fs = moments.Spectrum(sts)
# first split: AB - CDE
fs = fs.split(idx = 0, n0 = ns[0] + ns[3], n1 = ns[1] + ns[2] + ns[4])
fs.integrate(Npop=[nAB, nDCE], tf=t1)
# second split: C - DE
fs = fs.split(idx = 1, n0 = ns[1], n1 = ns[2] + ns[4])
fs.integrate(Npop=[nAB, nC, nDE], tf=t2)
# third split: A - B
fs = fs.split(idx = 0, n0 = ns[0], n1 = ns[3])
fs.integrate(Npop=[nA, nC, nDE, nB], tf=t3)
# fourth split: D - E
fs = fs.split(idx = 2, n0 = ns[2], n1 = ns[4])
# Full 5x5 migration matrix
mig_mat = [
[0, mAC, mAD, mAB, mAE],
[mAC, 0, mCD, mBC, mCE],
[mAD, mCD, 0, mBD, mDE],
[mAB, mBC, mBD, 0, mBE],
[mAE, mCE, mDE, mBE, 0 ]
]
fs.integrate(Npop=[nA, nC, nD, nB, nE], tf=t4, m=mig_mat)
return fs(base)
Parameters file:
Input file: easySFS/output/dadi/A-C-D-B-E.sfs
Output directory: results/model_5pops_01_quick
Custom filename: models/model_5pops_01.py
Num init const: 2
Global optimizer: SMAC_BO_combination
Global maxeval: 200
Number of repeats: 8
Number of processes: 8
Linked SNP's: False
Engine: moments
# NEED TO REFINE THIS
Mutation rate: 2.35e-08
# NEED TO FIND REFERENCE FOR THIS
Time for generation: 2.0
# Effective length of the sequence used to build the SFS data.
# This should be used together with the Mutation rate and can be replaced
# by the Theta0 setting.
# Default: None
# https://gadma.readthedocs.io/en/latest/faq.html#how-can-i-get-effective-length-of-sequence-l
# Assume total length of sequence that was used for SFS building is equal to Nseq.
# From this data total number of X SNP’s were received.
# But not all of them were used for SFS: some of them (e.g. total number of Y SNP’s) were filtered out.
# Then we should count filtered SNP’s and take L value the following way:
# L = (X - Y) / X * Nseq
# in our case:
# X = 171832 [total number of variable sites in m80p _stats.txt file]
# 16823 SNPs in the m80p unlinked VCF, so
# Y = 155009
# so L = ((171832 - 155009) / 171832) * 3995963) = 391220
Sequence length: 391220
# Engine that will draw demographic model plots.
# Can be moments or demes.
# Default: moments
Model plot engine: demes
Units of time in drawing: thousand years
Thanks for your help, Dan