avocado
avocado copied to clipboard
Work on getting Mutect algorithm up and running
This PR is to track progress and collaborate on an implementation of the MuTect algorithm referenced in #114.
Can one of the admins verify this patch?
Jenkins, add to whitelist and test this please.
Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/avocado-prb/99/ Test PASSed.
Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/avocado-prb/100/ Test PASSed.
Thanks for putting this together @jstjohn! I'm going to work on a harness for actually running data through this. Essentially, I think I'm going to move org.bdgenomics.avocado.algorithms.mutect.Mutect
into the org.bdgenomics.avocado.genotyping
package and make something like the BiallelicGenotyper
wrap it.
Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/avocado-prb/101/
Build result: FAILURE
GitHub pull request #167 of commit 4d5487ef8836aa22256228a4b1523d89b9e7dd08 automatically merged.[EnvInject] - Loading node environment variables.Building remotely on amp-jenkins-worker-02 (centos) in workspace /home/jenkins/workspace/avocado-prb > git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/bigdatagenomics/avocado.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/avocado.git > git --version # timeout=10 > git fetch --tags --progress https://github.com/bigdatagenomics/avocado.git +refs/pull/:refs/remotes/origin/pr/ > git rev-parse origin/pr/167/merge^{commit} # timeout=10 > git branch -a --contains 1c347abe9060998abea65394ffddea3280c2a100 # timeout=10 > git rev-parse remotes/origin/pr/167/merge^{commit} # timeout=10Checking out Revision 1c347abe9060998abea65394ffddea3280c2a100 (origin/pr/167/merge) > git config core.sparsecheckout # timeout=10 > git checkout -f 1c347abe9060998abea65394ffddea3280c2a100First time build. Skipping changelog.Triggering avocado-prb ? 2.2.0,centosTriggering avocado-prb ? 2.3.0,centosTriggering avocado-prb ? 1.0.4,centosavocado-prb ? 2.2.0,centos completed with result FAILUREavocado-prb ? 2.3.0,centos completed with result SUCCESSavocado-prb ? 1.0.4,centos completed with result SUCCESS Test FAILed.
Jenkins, retest this please.
Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/avocado-prb/102/
Build result: FAILURE
GitHub pull request #167 of commit 4d5487ef8836aa22256228a4b1523d89b9e7dd08 automatically merged.[EnvInject] - Loading node environment variables.Building remotely on amp-jenkins-worker-02 (centos) in workspace /home/jenkins/workspace/avocado-prb > git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/bigdatagenomics/avocado.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/avocado.git > git --version # timeout=10 > git fetch --tags --progress https://github.com/bigdatagenomics/avocado.git +refs/pull/:refs/remotes/origin/pr/ > git rev-parse origin/pr/167/merge^{commit} # timeout=10 > git branch -a --contains 1c347abe9060998abea65394ffddea3280c2a100 # timeout=10 > git rev-parse remotes/origin/pr/167/merge^{commit} # timeout=10Checking out Revision 1c347abe9060998abea65394ffddea3280c2a100 (origin/pr/167/merge) > git config core.sparsecheckout # timeout=10 > git checkout -f 1c347abe9060998abea65394ffddea3280c2a100First time build. Skipping changelog.Triggering avocado-prb ? 2.2.0,centosTriggering avocado-prb ? 2.3.0,centosTriggering avocado-prb ? 1.0.4,centosavocado-prb ? 2.2.0,centos completed with result SUCCESSavocado-prb ? 2.3.0,centos completed with result FAILUREavocado-prb ? 1.0.4,centos completed with result SUCCESS Test FAILed.
Jenkins, retest this please.
Looks like we have a Jenkins instance that is running Java 6. I've just modified our job setup to not use that machine.
Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/avocado-prb/103/
Build result: FAILURE
GitHub pull request #167 of commit 282011319f22e34fefdb890fbd3212d2daef6121 automatically merged.[EnvInject] - Loading node environment variables.Building remotely on amp-jenkins-worker-02 (centos) in workspace /home/jenkins/workspace/avocado-prb > git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/bigdatagenomics/avocado.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/avocado.git > git --version # timeout=10 > git fetch --tags --progress https://github.com/bigdatagenomics/avocado.git +refs/pull/:refs/remotes/origin/pr/ > git rev-parse origin/pr/167/merge^{commit} # timeout=10 > git branch -a --contains 07dd99d2d4a6191fa87d2a5dafa049190d575b0c # timeout=10 > git rev-parse remotes/origin/pr/167/merge^{commit} # timeout=10Checking out Revision 07dd99d2d4a6191fa87d2a5dafa049190d575b0c (origin/pr/167/merge) > git config core.sparsecheckout # timeout=10 > git checkout -f 07dd99d2d4a6191fa87d2a5dafa049190d575b0cFirst time build. Skipping changelog.Triggering avocado-prb ? 1.0.4,JDK 7u60,centosTriggering avocado-prb ? 2.2.0,JDK 7u60,centosTriggering avocado-prb ? 2.3.0,JDK 7u60,centosavocado-prb ? 1.0.4,JDK 7u60,centos completed with result FAILUREavocado-prb ? 2.2.0,JDK 7u60,centos completed with result FAILUREavocado-prb ? 2.3.0,JDK 7u60,centos completed with result FAILURE Test FAILed.
Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/avocado-prb/104/
Build result: FAILURE
GitHub pull request #167 of commit af3040fae5ce0e8d0e9c9dc1a62da1bfceccece7 automatically merged.[EnvInject] - Loading node environment variables.Building remotely on amp-jenkins-worker-02 (centos) in workspace /home/jenkins/workspace/avocado-prb > git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/bigdatagenomics/avocado.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/avocado.git > git --version # timeout=10 > git fetch --tags --progress https://github.com/bigdatagenomics/avocado.git +refs/pull/:refs/remotes/origin/pr/ > git rev-parse origin/pr/167/merge^{commit} # timeout=10 > git branch -a --contains c8230f5f8e72aa4c75b9f0ef584c852d803d7836 # timeout=10 > git rev-parse remotes/origin/pr/167/merge^{commit} # timeout=10Checking out Revision c8230f5f8e72aa4c75b9f0ef584c852d803d7836 (origin/pr/167/merge) > git config core.sparsecheckout # timeout=10 > git checkout -f c8230f5f8e72aa4c75b9f0ef584c852d803d7836First time build. Skipping changelog.Triggering avocado-prb ? 1.0.4,JDK 7u60,centosTriggering avocado-prb ? 2.2.0,JDK 7u60,centosTriggering avocado-prb ? 2.3.0,JDK 7u60,centosavocado-prb ? 1.0.4,JDK 7u60,centos completed with result FAILUREavocado-prb ? 2.2.0,JDK 7u60,centos completed with result FAILUREavocado-prb ? 2.3.0,JDK 7u60,centos completed with result FAILURE Test FAILed.
Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/avocado-prb/105/
Build result: FAILURE
GitHub pull request #167 of commit 3c1511d19ad9cb733cbbb8afbaadee783b8852fd automatically merged.[EnvInject] - Loading node environment variables.Building remotely on amp-jenkins-worker-02 (centos) in workspace /home/jenkins/workspace/avocado-prb > git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/bigdatagenomics/avocado.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/avocado.git > git --version # timeout=10 > git fetch --tags --progress https://github.com/bigdatagenomics/avocado.git +refs/pull/:refs/remotes/origin/pr/ > git rev-parse origin/pr/167/merge^{commit} # timeout=10 > git branch -a --contains b46bfe4157c8c10eed7056890c0b00aceff844eb # timeout=10 > git rev-parse remotes/origin/pr/167/merge^{commit} # timeout=10Checking out Revision b46bfe4157c8c10eed7056890c0b00aceff844eb (origin/pr/167/merge) > git config core.sparsecheckout # timeout=10 > git checkout -f b46bfe4157c8c10eed7056890c0b00aceff844ebFirst time build. Skipping changelog.Triggering avocado-prb ? 1.0.4,JDK 7u60,centosTriggering avocado-prb ? 2.2.0,JDK 7u60,centosTriggering avocado-prb ? 2.3.0,JDK 7u60,centosavocado-prb ? 1.0.4,JDK 7u60,centos completed with result FAILUREavocado-prb ? 2.2.0,JDK 7u60,centos completed with result FAILUREavocado-prb ? 2.3.0,JDK 7u60,centos completed with result FAILURE Test FAILed.
Just realized something, one Mutect filter is the distance between an allele and an indel in a read. So for example a read with an indel may be filtered out for one allele call adjacent to that indel, but that same read would be included for an allele that is further away on a read. @fnothaft what would you recommend for this? It seems like it might be good to just pass the reads themselves into MuTectGenotyper rather than adding yet another thing to the Allele object and passing that in.
Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/avocado-prb/108/ Test PASSed.
Added some new partially implemented features, along with tests which should not yet be passing.
Woot! Super close. Ok so Frank, the only thing I need to change is the known sites filter. The thing is that this is not a binary keep/throw away kind of situation. Basically if the site is a known variant, then you have a different prior probability of the mutation being germline than if it is a site that had never been seen before.E See lines 167-169 of MutectGenotyper.scala for how this is used.
As you can see, right now I have the placeholder in my code val dbSNPsite = false
. Ideally this would be the real value, or I could just do this filter later -- during the postprocessing step as you have implemented in your latest PR for example. So you don't necessarily throw out all sites that overlap dbSNP, you just have to have more evidence in the normal that the site is actually not just a heterozygous site -- the burden of proof is higher for the site being unique to the tumor.
One way to get this is to pass this information through with the variant, and decide which cutoff to use for the log odds score during the post processing step. If we go in this direction, what is the best place to stick this arbitrary bit of data, the normal log odds of being germline? Thanks for your time and thoughts on this @fnothaft!
@jstjohn oh, hah! Goofy mistake on my side. I think that is an easy change to make, actually! I will have a PR against this branch with the fix tomorrow.
Thanks Frank!!
On Sat, Aug 1, 2015 at 6:09 PM, Frank Austin Nothaft < [email protected]> wrote:
@jstjohn https://github.com/jstjohn oh, hah! Goofy mistake on my side. I think that is an easy change to make, actually! I will have a PR against this branch with the fix tomorrow.
— Reply to this email directly or view it on GitHub https://github.com/bigdatagenomics/avocado/pull/167#issuecomment-126970850 .
Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/avocado-prb/133/ Test PASSed.
Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/avocado-prb/137/ Test PASSed.
Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/avocado-prb/138/