CAFE icon indicating copy to clipboard operation
CAFE copied to clipboard

Implement Cecile Ane's polyploidy method as a new CAFE feature

Open benfulton opened this issue 7 years ago • 3 comments

It would be interesting to implement the method Ane developed in Rabier et al. (2014). Ane has given permission and we need to make sure to give credit where due.

https://www.ncbi.nlm.nih.gov/pubmed/24361993

Software: http://www.stat.wisc.edu/~ane/wgd/

benfulton avatar Mar 16 '17 00:03 benfulton

Installed R 3.4 to /usr/local/lib/R/bin, then library("phyext") library("WGDgc") tre.string = "(D:{0,18.03},(C:{0,12.06},(B:{0,7.06},A:{0,7.06}):{0,2.49:wgd,0:0,2.50}):{0, 5.97});" tre.phylo4d = read.simmap(text=tre.string) dat = data.frame(A=c(2,2,3,1), B=c(3,0,2,1), C=c(1,0,2,2), D=c(2,1,1,1)); a = processInput(tre.phylo4d, startingQ=0.9) getLikGeneCount(log(c(.01,.02)),a,dat,mMax=8,geomProb=1/1.5,conditioning="oneOrMore")

benfulton avatar May 09 '17 19:05 benfulton

With permission, I would like to implement this feature as part of my PhD Thesis. I need the gene count history information output by Cafe plus the WGD handling of Ane and Rabier. Also, I think it's good to consolidate features in well-supported bioinformatics tools. Cafe is already a great program, and I'd like to contribute. I have 3 years of experience in C++ and 15 years of experience in other languages. I plan on forking the repo and submitting a pull request when I'm done.

Key points to verify before I get started

  1. How much overlap is there in the underlying mathematics of Cafe and WGDgc is there?
  2. Is it sufficient to model a WGD as an instantaneous spike in the birth parameter λ, while µ = 0?
  3. Will Cafe handle a tree containing an internal line singleton of size 0? This is the way it's annotated in the WGDgc SIMMAP format.
    • Possible solution is a length = 1 line segment. But this isn't great for polyploids <1Mya.
  4. Will λ =1 for length 1 mean every gene family is doubled, or is it probabilistic growth with replacement allowing for 3x and 4x duplicates?

Would it be acceptable for me to implement this feature?

josiahseaman avatar Jul 12 '18 13:07 josiahseaman

Hi,

We would love to have you work on this. We are planning on a new codebase for CAFE that would be more appropriate for this work. I suggest we work together on a branch in the new repository as I suspect it will be a multiple-commit process. Send me a note (befulton at iu dot edu).

benfulton avatar Jul 13 '18 14:07 benfulton