gatk
gatk copied to clipboard
(Do not merge) Port of nvscorevariants into GATK, with a basic tool frontend
Minimal GATK port of nvscorevariants from https://github.com/NVIDIA-Genomics-Research/nvscorevariants
The tool runs successfully in both 1D and 2D modes, and a strict integration test passes for the 1D model. However, this PR has a number of outstanding issues that need to be resolved before it can be merged and replace the legacy CNNScoreVariants tool:
-
The conda environment in scripts/nvscorevariants_environment.yml needs to be incorporated into the main GATK conda environment
-
The integration test for the 2D model does not currently pass, despite using a much higher epsilon than the 1D test. Some of the scores differ by significant amounts vs. the CNNScoreVariants 2D output. We need to investigate why this is.
-
There is currently no training tool to train a new model, like there is for the legacy CNN tool.
@samuelklee and @mwalker174 , could you please comment on what it would take to incorporate the scripts/nvscorevariants_environment.yml
conda environment into the main GATK conda environment, assuming we are free to remove/retire the CNN tool?
@lbergelson and @zamirai, please do a general code review when you get a chance.
Github actions tests reported job failures from actions build 2935907552 Failures in the following jobs:
Test Type | JDK | Job ID | Logs |
---|---|---|---|
cloud | 11 | 2935907552.11 | logs |
cloud | 8 | 2935907552.10 | logs |
unit | 11 | 2935907552.13 | logs |
integration | 11 | 2935907552.12 | logs |
conda | 8 | 2935907552.3 | logs |
unit | 8 | 2935907552.1 | logs |
variantcalling | 8 | 2935907552.2 | logs |
integration | 8 | 2935907552.0 | logs |
Thanks, @droazen! @asmirnov239 has been looking at PyMC3 updates for gCNV, which will help unlock the conda environment. I understand he has a working branch, but needs to do more testing—perhaps he can comment further?
Thanks @droazen! What data are you using to test the 2D model? And can we have access to your verification method?
Github actions tests reported job failures from actions build 3002176541 Failures in the following jobs:
Test Type | JDK | Job ID | Logs |
---|---|---|---|
cloud | 11 | 3002176541.11 | logs |
cloud | 8 | 3002176541.10 | logs |
unit | 11 | 3002176541.13 | logs |
integration | 11 | 3002176541.12 | logs |
unit | 8 | 3002176541.1 | logs |
integration | 8 | 3002176541.0 | logs |
variantcalling | 8 | 3002176541.2 | logs |
conda | 8 | 3002176541.3 | logs |
Github actions tests reported job failures from actions build 3092731818 Failures in the following jobs:
Test Type | JDK | Job ID | Logs |
---|---|---|---|
cloud | 8 | 3092731818.10 | logs |
cloud | 11 | 3092731818.11 | logs |
unit | 11 | 3092731818.13 | logs |
integration | 11 | 3092731818.12 | logs |
conda | 8 | 3092731818.3 | logs |
unit | 8 | 3092731818.1 | logs |
integration | 8 | 3092731818.0 | logs |
variantcalling | 8 | 3092731818.2 | logs |
@zamirai I've incorporated your patch from https://github.com/NVIDIA-Genomics-Research/nvscorevariants/commit/937ffafb78b0f3e7df9b1edc3b08d11e3ebee35a into this PR. With this change, the 2D tests now pass, even when I reduce the epsilon to 0.01. Thanks for the fix!
@asmirnov239 is now working on merging the new conda environment into the GATK conda environment and making the necessary updates to existing tools. This will likely require at least another few weeks.
Github actions tests reported job failures from actions build 3092905417 Failures in the following jobs:
Test Type | JDK | Job ID | Logs |
---|---|---|---|
cloud | 8 | 3092905417.10 | logs |
cloud | 11 | 3092905417.11 | logs |
unit | 11 | 3092905417.13 | logs |
integration | 11 | 3092905417.12 | logs |
unit | 8 | 3092905417.1 | logs |
conda | 8 | 3092905417.3 | logs |
variantcalling | 8 | 3092905417.2 | logs |
integration | 8 | 3092905417.0 | logs |
Rebased onto latest master
Github actions tests reported job failures from actions build 3291375153 Failures in the following jobs:
Test Type | JDK | Job ID | Logs |
---|---|---|---|
cloud | 8 | 3291375153.10 | logs |
cloud | 11 | 3291375153.11 | logs |
unit | 11 | 3291375153.13 | logs |
integration | 11 | 3291375153.12 | logs |
unit | 8 | 3291375153.1 | logs |
conda | 8 | 3291375153.3 | logs |
variantcalling | 8 | 3291375153.2 | logs |
integration | 8 | 3291375153.0 | logs |
Github actions tests reported job failures from actions build 3300297321 Failures in the following jobs:
Test Type | JDK | Job ID | Logs |
---|---|---|---|
cloud | 8 | 3300297321.10 | logs |
unit | 11 | 3300297321.13 | logs |
cloud | 11 | 3300297321.11 | logs |
conda | 8 | 3300297321.3 | logs |
integration | 11 | 3300297321.12 | logs |
unit | 8 | 3300297321.1 | logs |
variantcalling | 8 | 3300297321.2 | logs |
integration | 8 | 3300297321.0 | logs |
Github actions tests reported job failures from actions build 3300316784 Failures in the following jobs:
Test Type | JDK | Job ID | Logs |
---|---|---|---|
cloud | 8 | 3300316784.10 | logs |
cloud | 11 | 3300316784.11 | logs |
unit | 11 | 3300316784.13 | logs |
integration | 11 | 3300316784.12 | logs |
conda | 8 | 3300316784.3 | logs |
unit | 8 | 3300316784.1 | logs |
variantcalling | 8 | 3300316784.2 | logs |
integration | 8 | 3300316784.0 | logs |