msprime
msprime copied to clipboard
Clarity docs for population_size and initial_size
It looks to me as if the population_size and initial_size parameters are specified in number of individuals, not number of genomes. That is, they depend on the "ploidy" setting. For instance
demography = msprime.Demography()
demography.add_population(initial_size=1/2)
msprime.sim_ancestry(samples=6, ploidy=2, demography=demography, random_seed=1234)
is the same as
demography = msprime.Demography()
demography.add_population(initial_size=1)
msprime.sim_ancestry(samples=12, ploidy=1, demography=demography, random_seed=1234)
Note that this differs (I think) from the msprime 0.7 situation where diploidy was assumed, so you would always set Ne=1/2
.
I think we should note this somewhere around here.
I was sure we made this clear - can you have another look and see if you can see where it's discussed? Maybe we just need some more links into this section from various places where sizes are discussed.
I think the section you mean is here:
https://tskit.dev/msprime/docs/latest/ancestry.html#sec-ancestry-ploidy
But we only talk about time scales. We never say that another way of looking at it is that the default ploidy changes the total number of haploid genomes in the "population" because population sizes are measured in number of individuals. I.e. if you set pop_size=50 in a diploid model, you have 100 haploid genomes floating about, and to get the same results in a diploid model you need to set pop_size=100. Or equivalently, the two following incantations are equivalent.
ts = msprime.sim_ancestry(
samples=[msprime.SampleSet(4, ploidy=1)],
population_size=1,
ploidy=1,
random_seed=1234
)
and
ts = msprime.sim_ancestry(
samples=[msprime.SampleSet(4, ploidy=1)],
population_size=1/2,
ploidy=2,
random_seed=1234
)
It's definitely not clear to me, anyway.