msprime icon indicating copy to clipboard operation
msprime copied to clipboard

Simulating sex chromosomes

Open LukeAndersonTrocme opened this issue 2 years ago • 9 comments

I've already simulated 22 autosomes for ~1.5M samples using the FixedPedigree model and I'm wondering about how much work might be required to simulate sex chromosomes. We are planning on releasing these data on zenodo, and in a worst case scenario, we can update the data dump to include sex chromosomes down the line, but it would be nice to be able to include all chromosomes right from the start.

I had a very brief chat with @hyanwong and Léa Guyon about this topic at ProbGen22 and from what I recall it sounds like a pretty straight forward process that mostly involves careful pruning of the input pedigree to restrict the ancestry simulation to biologically plausible inheritance paths.

Here's what I think is needed, but please let me know if/where there are issues I'm overlooking:

Y chromosome

  • Prune out all pedigree links that are not male-male
  • Run simulation with zero recombination on this paternal pedigree (note: only male probands are simulated)

Mitochondria

  • Prune out all pedigree links from fathers to offspring (i.e. only keep links between mothers and offspring)
  • Run simulation with zero recombination on this mito pedigree (note: all probands are simulated)

X chromosome

  • Partially prune pedigree to exclude male-male inheritance of X
  • Run simulation as any other autosome (with hapmap recombination map)
  • For male sample nodes, delete paternally inherited X chromosome

I think the Y and Mito are relatively straight forward to implement, though I think the X chromosome may need some extra considerations, but AFAIK these only impact the sample nodes.

I guess my question is, does anyone have a sense of whether the FixedPedigree ancestry simulation will handle these partially pruned pedigrees?

Any comments, feedback is appreciated!

Cheers

LukeAndersonTrocme avatar Apr 21 '22 16:04 LukeAndersonTrocme

This is a great idea @LukeAndersonTrocme!

The only issue I can see with running the FixedPedigree sims on these pruned pedigrees is that we'll hit dead-ends more quickly, but that's the reality too.

I think it would take a bit of work to implement the "cutting" and fully validate the sims, so I'd vote for keeping this feature as something that's on the TODO list.

jeromekelleher avatar Apr 22 '22 05:04 jeromekelleher

Interesting point about the dead-ends, makes sense. And yes, agreed to keep this for a TODO. Either way, I might try to prune the pedigree myself and run the sims to see what happens. I might post some updates in this thread when I get around to it.

LukeAndersonTrocme avatar Apr 22 '22 14:04 LukeAndersonTrocme

It seems this is an msprime issue - shall I move it there?

benjeffery avatar Apr 22 '22 15:04 benjeffery

Commenting as I would like to know as well. How would you go about pruning male male connections during simulation? I could be wrong but my assumption is you'd have to do that to prevent erroneous recombination in males.

Also how do you track sex in msprime? I couldn't find anything in a cursory look through the documentation.

Darokrithia avatar Aug 09 '22 22:08 Darokrithia

Also how do you track sex in msprime? I couldn't find anything in a cursory look through the documentation.

There's no specific features at the moment, but you can add arbitrary metadata to your individuals in the input pedigree so you can find the males/females later.

jeromekelleher avatar Aug 10 '22 08:08 jeromekelleher

Thanks for reviving this thread @Darokrithia.

I'm not sure how sex is tracked during the simulations, as far as I know it is included in the metadata, but not actually used in the simulation process for autosomes.

My comments about pruning the trees were referring to modifying the genealogy prior to the simulations. Just to flesh out a rough idea on this using a recursive tree climbing approach (likely only done once so may not need to be super efficient):

  • start at the bottom (i.e male probands)
  • ascend one generation conditional on parental sex such that only males are recorded
  • repeat with the next generation of males
  • stop when founder is reached, full stop when all founders are reached

This pruned pedigree should only then have records of paternal inheritance.

If this sounds reasonable, then the next step is plugging this pruned pedigree into the msprime simulation. Would it just be a simple as using a haploid inheritance model?

Where things get interesting is the dose dependence of the X chromosome.. Male probands would only need one instance of this chromosome but female probands need two.

LukeAndersonTrocme avatar Aug 10 '22 16:08 LukeAndersonTrocme

image

Maybe obvious to most, but I found that drawing out the inheritance paths helped clarify things to me

(edit: RED -- male probands potential inheritance, BLUE -- female probands potential inheritance, SOLID -- deterministic, DASHED -- stochastic, YELLOW -- highlights impossible inheritance of X)

LukeAndersonTrocme avatar Aug 10 '22 16:08 LukeAndersonTrocme

@LukeAndersonTrocme to clarify does this require a pedigree to be generated (and then pruned) before simulation?

Darokrithia avatar Aug 11 '22 03:08 Darokrithia

Yes exactly. Sorry I should have clarified this from the top. This is in the context of fixed pedigree ancestry simulations.

On Aug 10, 2022, at 11:51 PM, Daniel Tabin @.***> wrote:

 @LukeAndersonTrocme to clarify does this require a pedigree to be generated (and then pruned) before simulation?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.

LukeAndersonTrocme avatar Aug 11 '22 12:08 LukeAndersonTrocme