docker-builds icon indicating copy to clipboard operation
docker-builds copied to clipboard

New tool TBProfiler

Open jwarnn opened this issue 3 years ago • 2 comments

Hello; This was a requested container, Issue #416. It was something I looked into a few months ago but never completed. It uses the pipelines conda package in the build. The test is very simple but that pipeline is very large and running a full test would take some time. I hope this is helpful for the community.

Pull Request (PR) checklist:

  • [X] Include a description of what is in this pull request in this message.
  • [X] The dockerfile successfully builds to a test target for the user creating the PR. (i.e. docker build --tag samtools:1.15test --target test docker-builds/samtools/1.15 )
  • [X] Directory structure as name of the tool in lower case with special characters removed with a subdirectory of the version number (i.e. spades/3.12.0/Dockerfile)
    • [ ] (optional) All test files are located in same directory as the Dockerfile (i.e. shigatyper/2.0.1/test.sh)
  • [X] Create a simple container-specific README.md in the same directory as the Dockerfile (i.e. spades/3.12.0/README.md)
    • [X] If this README is longer than 30 lines, there is an explanation as to why more detail was needed
  • [X] Dockerfile includes the recommended LABELS
  • [X] Main README.md has been updated to include the tool and/or version of the dockerfile(s) in this PR
  • [X] Program_Licenses.md contains the tool(s) used in this PR and has been updated for any missing

jwarnn avatar Aug 04 '22 20:08 jwarnn

Thanks for the PR!

Does this also install the TBDB? https://github.com/jodyphelan/tbdb

My understanding is that the TBDB is updated every so often and that a database update would warrant a new docker image. I wonder if we should considered managing a docker image similar to pangolin & pangolin-data where if one is updated, we usually create a new docker image.

It would be nice to also be able to pin the version of the TBDB so that we know exactly which version is included.

cc @frankambrosio3

kapsakcj avatar Aug 04 '22 21:08 kapsakcj

First off I need to add downloading the db into the dockerfile but I think we should version what db is being used. I looked into what is built into the tool as far as the db versions and releases. If you look at TBProfiler running tb-profiler update_tbdb preforms a git pull of the tbdb and you can pick a version with the --branch, -b flag.

def main_update_tbdb(args):
    if pp.nofolder("tbdb"):
        pp.run_cmd("git clone https://github.com/jodyphelan/tbdb.git")
    os.chdir("tbdb")
    pp.run_cmd(f'git checkout {args.branch}')

    pp.run_cmd("git pull")
    tmp = "--match_ref %s" % args.match_ref if args.match_ref else ""
    pp.run_cmd("tb-profiler create_db %s --load" % tmp)
    os.chdir("../")
    pp.successlog("Sucessfully updated TBDB")

parser_sub.add_argument('--branch','-b',default="master",help='Storage directory')

There are four options : master, who, who_list, extended. Unfortunately none of these branches are version controlled or have releases so right now it would be impossible to assign a TBProfiler_DB_VER or anything like that.

I have submitted an issue on the git hub repository. I am sure a little more encouragement might help.

Should I place this PR in draft or anything like that till this gets resolved? Thanks.

jwarnn avatar Aug 05 '22 15:08 jwarnn

Resfinder has that issue as well (https://github.com/StaPH-B/docker-builds/blob/master/resfinder/4.1.11/Dockerfile)

Actually a lot of tools have that issue (including amrfinder). I recommend using

tb-profiler update_tbdb

and then specifying the database version/date/readout/whatever in the readme.

Similar to https://github.com/StaPH-B/docker-builds/tree/master/ncbi-amrfinderplus/3.10.36

erinyoung avatar Aug 12 '22 21:08 erinyoung

I have updated the Dockerfile so that it installs the db needed for it to run. Each release has a json file, found here ./TBProfiler-${TBPROFILER_VER}/db/tbdb.version.json, that specifies which commit on jodyphelan/tbdb that the db matches.

I also considered a more robust test but this is a fairly large workflow with only a few entry points. The following test is still running for me and it has been over two and a half hours.

RUN mkdir test_run && \
    cd test_run && \
    wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR166/009/ERR1664619/ERR1664619_1.fastq.gz && \
    wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR166/009/ERR1664619/ERR1664619_2.fastq.gz && \
    tb-profiler profile -1 ERR1664619_1.fastq.gz -2 ERR1664619_2.fastq.gz -t 4 -p ERR1664619 --txt

jwarnn avatar Aug 19 '22 16:08 jwarnn

If I remember correctly, github actions time out at 30 minutes.

We can try the github actions, and if it times out I'll run the test locally and push if its successful.

erinyoung avatar Aug 19 '22 20:08 erinyoung

Added the more robust test and looks like it completed successfully!

jwarnn avatar Aug 19 '22 20:08 jwarnn

Looks like it ran successfully:

#11 11.99 Using ref file: /opt/conda/share/tbprofiler//tbdb.fasta
#11 11.99 Using gff file: /opt/conda/share/tbprofiler//tbdb.gff
#11 11.99 Using bed file: /opt/conda/share/tbprofiler//tbdb.bed
#11 11.99 Using version file: /opt/conda/share/tbprofiler//tbdb.version.json
#11 11.99 Using json_db file: /opt/conda/share/tbprofiler//tbdb.dr.json
#11 12.05 Using variables file: /opt/conda/share/tbprofiler//tbdb.variables.json
#11 12.05 Using spoligotype_spacers file: /opt/conda/share/tbprofiler//tbdb.spoligotype_spacers.txt
#11 12.05 Using spoligotype_annotations file: /opt/conda/share/tbprofiler//tbdb.spoligotype_list.csv
#11 12.05 Using barcode file: /opt/conda/share/tbprofiler//tbdb.barcode.bed

erinyoung avatar Aug 19 '22 20:08 erinyoung

Can you put the database version in your tool-specific readme? (tbprofiler/4.3/README.md)

erinyoung avatar Aug 19 '22 20:08 erinyoung

Here's an example:

https://github.com/StaPH-B/docker-builds/tree/master/ncbi-amrfinderplus/3.10.36

Perhaps I should add a database section in the readme template

erinyoung avatar Aug 19 '22 20:08 erinyoung

Works for me! I'm going to

  1. merge this
  2. make sure everything is there on quay (it's been awhile so this might take me... awhile)
  3. get this container to quay and dockerhub

erinyoung avatar Aug 19 '22 21:08 erinyoung