long-read-pipelines
long-read-pipelines copied to clipboard
Always use GATK releases (or at least master branch)
Not consistent with best practices and will be a maintenance headache.
Done - we now fetch the GATK from Github and build it anew in each Docker that needs it.
Not done yet as originally intended. https://github.com/broadinstitute/long-read-pipelines/blob/184b511fb4cd1d4c249361116f572efecdccfa18/docker/lr-utils/Dockerfile#L19
This is still relying on a particular branch without pinning to a hash. Preferably it should be a master branch hash, but that need a vigorous PR review process.
Sorry for keeping coming back to this issue. I searched for mentions of "gatk" in the repo, and found the following (removed some doc references)
- https://github.com/broadinstitute/long-read-pipelines/blob/e89c70021995ee3818b683f652f0c37abfe8831d/wdl/tasks/UnalignedMetrics.wdl#L129
- https://github.com/broadinstitute/long-read-pipelines/blob/799f725582ffdd9aaec6603eb5eb971379081671/wdl/PB10xSingleProcessedSample.wdl#L212
- https://github.com/broadinstitute/long-read-pipelines/blob/713f878be083c02722f335d527d2dea795c58d18/wdl/tasks/PBUtils.wdl#L174
- https://github.com/broadinstitute/long-read-pipelines/blob/e89c70021995ee3818b683f652f0c37abfe8831d/wdl/tasks/AlignedMetrics.wdl#L582
- https://github.com/broadinstitute/long-read-pipelines/blob/e89c70021995ee3818b683f652f0c37abfe8831d/wdl/tasks/Utils.wdl#L20
all of which are not following the recommended way of using GATK (via the launch script). I also dug deeper and found that the dockers referenced in those tasks are not official GATK images, which is not optimal.
So what comes with this observation is that we most likely need to
- merge our one-off long read tools into the GATK repo (with all the due diligences) and preferably ask for a release
- re-factor the tasks to do one and only one job, so that we won't be forced to have a mega image based off of the GATK image (which is huge).