avocado icon indicating copy to clipboard operation
avocado copied to clipboard

Work on getting Mutect algorithm up and running

Open jstjohn opened this issue 9 years ago • 21 comments

This PR is to track progress and collaborate on an implementation of the MuTect algorithm referenced in #114.

jstjohn avatar May 18 '15 23:05 jstjohn

Can one of the admins verify this patch?

AmplabJenkins avatar May 18 '15 23:05 AmplabJenkins

Jenkins, add to whitelist and test this please.

fnothaft avatar May 20 '15 18:05 fnothaft

Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/avocado-prb/99/ Test PASSed.

AmplabJenkins avatar May 20 '15 19:05 AmplabJenkins

Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/avocado-prb/100/ Test PASSed.

AmplabJenkins avatar May 20 '15 19:05 AmplabJenkins

Thanks for putting this together @jstjohn! I'm going to work on a harness for actually running data through this. Essentially, I think I'm going to move org.bdgenomics.avocado.algorithms.mutect.Mutect into the org.bdgenomics.avocado.genotyping package and make something like the BiallelicGenotyper wrap it.

fnothaft avatar May 25 '15 23:05 fnothaft

Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/avocado-prb/101/

Build result: FAILURE

GitHub pull request #167 of commit 4d5487ef8836aa22256228a4b1523d89b9e7dd08 automatically merged.[EnvInject] - Loading node environment variables.Building remotely on amp-jenkins-worker-02 (centos) in workspace /home/jenkins/workspace/avocado-prb > git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/bigdatagenomics/avocado.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/avocado.git > git --version # timeout=10 > git fetch --tags --progress https://github.com/bigdatagenomics/avocado.git +refs/pull/:refs/remotes/origin/pr/ > git rev-parse origin/pr/167/merge^{commit} # timeout=10 > git branch -a --contains 1c347abe9060998abea65394ffddea3280c2a100 # timeout=10 > git rev-parse remotes/origin/pr/167/merge^{commit} # timeout=10Checking out Revision 1c347abe9060998abea65394ffddea3280c2a100 (origin/pr/167/merge) > git config core.sparsecheckout # timeout=10 > git checkout -f 1c347abe9060998abea65394ffddea3280c2a100First time build. Skipping changelog.Triggering avocado-prb ? 2.2.0,centosTriggering avocado-prb ? 2.3.0,centosTriggering avocado-prb ? 1.0.4,centosavocado-prb ? 2.2.0,centos completed with result FAILUREavocado-prb ? 2.3.0,centos completed with result SUCCESSavocado-prb ? 1.0.4,centos completed with result SUCCESS Test FAILed.

AmplabJenkins avatar May 26 '15 16:05 AmplabJenkins

Jenkins, retest this please.

fnothaft avatar May 26 '15 16:05 fnothaft

Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/avocado-prb/102/

Build result: FAILURE

GitHub pull request #167 of commit 4d5487ef8836aa22256228a4b1523d89b9e7dd08 automatically merged.[EnvInject] - Loading node environment variables.Building remotely on amp-jenkins-worker-02 (centos) in workspace /home/jenkins/workspace/avocado-prb > git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/bigdatagenomics/avocado.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/avocado.git > git --version # timeout=10 > git fetch --tags --progress https://github.com/bigdatagenomics/avocado.git +refs/pull/:refs/remotes/origin/pr/ > git rev-parse origin/pr/167/merge^{commit} # timeout=10 > git branch -a --contains 1c347abe9060998abea65394ffddea3280c2a100 # timeout=10 > git rev-parse remotes/origin/pr/167/merge^{commit} # timeout=10Checking out Revision 1c347abe9060998abea65394ffddea3280c2a100 (origin/pr/167/merge) > git config core.sparsecheckout # timeout=10 > git checkout -f 1c347abe9060998abea65394ffddea3280c2a100First time build. Skipping changelog.Triggering avocado-prb ? 2.2.0,centosTriggering avocado-prb ? 2.3.0,centosTriggering avocado-prb ? 1.0.4,centosavocado-prb ? 2.2.0,centos completed with result SUCCESSavocado-prb ? 2.3.0,centos completed with result FAILUREavocado-prb ? 1.0.4,centos completed with result SUCCESS Test FAILed.

AmplabJenkins avatar May 26 '15 16:05 AmplabJenkins

Jenkins, retest this please.

Looks like we have a Jenkins instance that is running Java 6. I've just modified our job setup to not use that machine.

fnothaft avatar May 26 '15 16:05 fnothaft

Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/avocado-prb/103/

Build result: FAILURE

GitHub pull request #167 of commit 282011319f22e34fefdb890fbd3212d2daef6121 automatically merged.[EnvInject] - Loading node environment variables.Building remotely on amp-jenkins-worker-02 (centos) in workspace /home/jenkins/workspace/avocado-prb > git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/bigdatagenomics/avocado.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/avocado.git > git --version # timeout=10 > git fetch --tags --progress https://github.com/bigdatagenomics/avocado.git +refs/pull/:refs/remotes/origin/pr/ > git rev-parse origin/pr/167/merge^{commit} # timeout=10 > git branch -a --contains 07dd99d2d4a6191fa87d2a5dafa049190d575b0c # timeout=10 > git rev-parse remotes/origin/pr/167/merge^{commit} # timeout=10Checking out Revision 07dd99d2d4a6191fa87d2a5dafa049190d575b0c (origin/pr/167/merge) > git config core.sparsecheckout # timeout=10 > git checkout -f 07dd99d2d4a6191fa87d2a5dafa049190d575b0cFirst time build. Skipping changelog.Triggering avocado-prb ? 1.0.4,JDK 7u60,centosTriggering avocado-prb ? 2.2.0,JDK 7u60,centosTriggering avocado-prb ? 2.3.0,JDK 7u60,centosavocado-prb ? 1.0.4,JDK 7u60,centos completed with result FAILUREavocado-prb ? 2.2.0,JDK 7u60,centos completed with result FAILUREavocado-prb ? 2.3.0,JDK 7u60,centos completed with result FAILURE Test FAILed.

AmplabJenkins avatar May 26 '15 17:05 AmplabJenkins

Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/avocado-prb/104/

Build result: FAILURE

GitHub pull request #167 of commit af3040fae5ce0e8d0e9c9dc1a62da1bfceccece7 automatically merged.[EnvInject] - Loading node environment variables.Building remotely on amp-jenkins-worker-02 (centos) in workspace /home/jenkins/workspace/avocado-prb > git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/bigdatagenomics/avocado.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/avocado.git > git --version # timeout=10 > git fetch --tags --progress https://github.com/bigdatagenomics/avocado.git +refs/pull/:refs/remotes/origin/pr/ > git rev-parse origin/pr/167/merge^{commit} # timeout=10 > git branch -a --contains c8230f5f8e72aa4c75b9f0ef584c852d803d7836 # timeout=10 > git rev-parse remotes/origin/pr/167/merge^{commit} # timeout=10Checking out Revision c8230f5f8e72aa4c75b9f0ef584c852d803d7836 (origin/pr/167/merge) > git config core.sparsecheckout # timeout=10 > git checkout -f c8230f5f8e72aa4c75b9f0ef584c852d803d7836First time build. Skipping changelog.Triggering avocado-prb ? 1.0.4,JDK 7u60,centosTriggering avocado-prb ? 2.2.0,JDK 7u60,centosTriggering avocado-prb ? 2.3.0,JDK 7u60,centosavocado-prb ? 1.0.4,JDK 7u60,centos completed with result FAILUREavocado-prb ? 2.2.0,JDK 7u60,centos completed with result FAILUREavocado-prb ? 2.3.0,JDK 7u60,centos completed with result FAILURE Test FAILed.

AmplabJenkins avatar May 26 '15 20:05 AmplabJenkins

Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/avocado-prb/105/

Build result: FAILURE

GitHub pull request #167 of commit 3c1511d19ad9cb733cbbb8afbaadee783b8852fd automatically merged.[EnvInject] - Loading node environment variables.Building remotely on amp-jenkins-worker-02 (centos) in workspace /home/jenkins/workspace/avocado-prb > git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/bigdatagenomics/avocado.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/avocado.git > git --version # timeout=10 > git fetch --tags --progress https://github.com/bigdatagenomics/avocado.git +refs/pull/:refs/remotes/origin/pr/ > git rev-parse origin/pr/167/merge^{commit} # timeout=10 > git branch -a --contains b46bfe4157c8c10eed7056890c0b00aceff844eb # timeout=10 > git rev-parse remotes/origin/pr/167/merge^{commit} # timeout=10Checking out Revision b46bfe4157c8c10eed7056890c0b00aceff844eb (origin/pr/167/merge) > git config core.sparsecheckout # timeout=10 > git checkout -f b46bfe4157c8c10eed7056890c0b00aceff844ebFirst time build. Skipping changelog.Triggering avocado-prb ? 1.0.4,JDK 7u60,centosTriggering avocado-prb ? 2.2.0,JDK 7u60,centosTriggering avocado-prb ? 2.3.0,JDK 7u60,centosavocado-prb ? 1.0.4,JDK 7u60,centos completed with result FAILUREavocado-prb ? 2.2.0,JDK 7u60,centos completed with result FAILUREavocado-prb ? 2.3.0,JDK 7u60,centos completed with result FAILURE Test FAILed.

AmplabJenkins avatar May 26 '15 20:05 AmplabJenkins

Just realized something, one Mutect filter is the distance between an allele and an indel in a read. So for example a read with an indel may be filtered out for one allele call adjacent to that indel, but that same read would be included for an allele that is further away on a read. @fnothaft what would you recommend for this? It seems like it might be good to just pass the reads themselves into MuTectGenotyper rather than adding yet another thing to the Allele object and passing that in.

jstjohn avatar May 26 '15 20:05 jstjohn

Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/avocado-prb/108/ Test PASSed.

AmplabJenkins avatar May 27 '15 00:05 AmplabJenkins

Added some new partially implemented features, along with tests which should not yet be passing.

jstjohn avatar May 30 '15 19:05 jstjohn

Woot! Super close. Ok so Frank, the only thing I need to change is the known sites filter. The thing is that this is not a binary keep/throw away kind of situation. Basically if the site is a known variant, then you have a different prior probability of the mutation being germline than if it is a site that had never been seen before.E See lines 167-169 of MutectGenotyper.scala for how this is used.

As you can see, right now I have the placeholder in my code val dbSNPsite = false. Ideally this would be the real value, or I could just do this filter later -- during the postprocessing step as you have implemented in your latest PR for example. So you don't necessarily throw out all sites that overlap dbSNP, you just have to have more evidence in the normal that the site is actually not just a heterozygous site -- the burden of proof is higher for the site being unique to the tumor.

One way to get this is to pass this information through with the variant, and decide which cutoff to use for the log odds score during the post processing step. If we go in this direction, what is the best place to stick this arbitrary bit of data, the normal log odds of being germline? Thanks for your time and thoughts on this @fnothaft!

jstjohn avatar Aug 01 '15 22:08 jstjohn

@jstjohn oh, hah! Goofy mistake on my side. I think that is an easy change to make, actually! I will have a PR against this branch with the fix tomorrow.

fnothaft avatar Aug 02 '15 01:08 fnothaft

Thanks Frank!!

On Sat, Aug 1, 2015 at 6:09 PM, Frank Austin Nothaft < [email protected]> wrote:

@jstjohn https://github.com/jstjohn oh, hah! Goofy mistake on my side. I think that is an easy change to make, actually! I will have a PR against this branch with the fix tomorrow.

— Reply to this email directly or view it on GitHub https://github.com/bigdatagenomics/avocado/pull/167#issuecomment-126970850 .

jstjohn avatar Aug 02 '15 02:08 jstjohn

Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/avocado-prb/133/ Test PASSed.

AmplabJenkins avatar Aug 30 '15 17:08 AmplabJenkins

Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/avocado-prb/137/ Test PASSed.

AmplabJenkins avatar Mar 10 '16 23:03 AmplabJenkins

Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/avocado-prb/138/

Build result: FAILURE

GitHub pull request #167 of commit f7ad48be667c4e36751ed7120e873045fa2e73e9 automatically merged.[EnvInject] - Loading node environment variables.Building remotely on amp-jenkins-worker-02 (centos spark-test spark-compile) in workspace /home/jenkins/workspace/avocado-prb > git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/bigdatagenomics/avocado.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/avocado.git > git --version # timeout=10 > git fetch --tags --progress https://github.com/bigdatagenomics/avocado.git +refs/pull/:refs/remotes/origin/pr/ > git rev-parse origin/pr/167/merge^{commit} # timeout=10 > git branch -a --contains 239bf462ff06fe881648bce0eb0c26fbe37e8618 # timeout=10 > git rev-parse remotes/origin/pr/167/merge^{commit} # timeout=10Checking out Revision 239bf462ff06fe881648bce0eb0c26fbe37e8618 (origin/pr/167/merge) > git config core.sparsecheckout # timeout=10 > git checkout -f 239bf462ff06fe881648bce0eb0c26fbe37e8618First time build. Skipping changelog.Triggering avocado-prb ? 2.3.0,centosTriggering avocado-prb ? 1.0.4,centosTriggering avocado-prb ? 2.2.0,centosavocado-prb ? 2.3.0,centos completed with result FAILUREavocado-prb ? 1.0.4,centos completed with result FAILUREavocado-prb ? 2.2.0,centos completed with result SUCCESS Test FAILed.

AmplabJenkins avatar Mar 11 '16 00:03 AmplabJenkins