docker-builds icon indicating copy to clipboard operation
docker-builds copied to clipboard

Requesting to add PolkaPox container to StaPH-B docker-builds

Open jessicarowell opened this issue 1 year ago • 7 comments

Pull Request (PR) checklist:

  • [x] Include a description of what is in this pull request in this message.
  • [x] The dockerfile successfully builds to a test target for the user creating the PR. (i.e. docker build --tag samtools:1.15test --target test docker-builds/samtools/1.15 )
  • [x] Directory structure as name of the tool in lower case with special characters removed with a subdirectory of the version number (i.e. spades/3.12.0/Dockerfile)
    • [ ] (optional) All test files are located in same directory as the Dockerfile (i.e. shigatyper/2.0.1/test.sh)
  • [x] Create a simple container-specific README.md in the same directory as the Dockerfile (i.e. spades/3.12.0/README.md)
    • [ ] If this README is longer than 30 lines, there is an explanation as to why more detail was needed
  • [x] Dockerfile includes the recommended LABELS
  • [x] Main README.md has been updated to include the tool and/or version of the dockerfile(s) in this PR
  • [x] Program_Licenses.md contains the tool(s) used in this PR and has been updated for any missing

jessicarowell avatar Feb 01 '24 19:02 jessicarowell

This is what I understand about this image:

You aren't actually trying to put polkapox into a container. Polkapox is a nextflow workflow, which can get very big. Instead, you have a process that you need for a python script that has some specific dependencies. This is similar to the concept of the berrywood-report-env image.

Specifically, I think you are trying to set up a container to run https://github.com/CDCgov/polkapox/blob/master/bin/mpxv-AssemblyGraph_gfaPy.py

This is a great use for this repository, and I would love to help!

Things that I should warn you about:

  1. Deployed images are built to the app stage. Anything after that is lost to the nether.
  2. We have a set of required labels for hosted images. Also, a set work directory of /data.
  3. You can remove the things you don't need (there's no reason to keep blast)

Do you have files that you use for unit testing? If so, we can add them to this repo or download them into the image during the test layer.

erinyoung avatar Feb 01 '24 20:02 erinyoung

This is what I understand about this image:

You aren't actually trying to put polkapox into a container. Polkapox is a nextflow workflow, which can get very big. Instead, you have a process that you need for a python script that has some specific dependencies. This is similar to the concept of the berrywood-report-env image.

Specifically, I think you are trying to set up a container to run https://github.com/CDCgov/polkapox/blob/master/bin/mpxv-AssemblyGraph_gfaPy.py

This is a great use for this repository, and I would love to help!

Things that I should warn you about:

  1. Deployed images are built to the app stage. Anything after that is lost to the nether.
  2. We have a set of required labels for hosted images. Also, a set work directory of /data.
  3. You can remove the things you don't need (there's no reason to keep blast)

Do you have files that you use for unit testing? If so, we can add them to this repo or download them into the image during the test layer.

This is true - we're not putting polkapox in a container. I should have been more explicit about that! I'm a little confused.
For 1) - what does it mean that deployed images are build to the app stage? Does this dockerfile not get pulled and then built by nextflow when it's called by a module? What parts are lost and how does that happen? For 2) Thanks - I added the labels. I think I set the work dir to /data (at the end) - should I install blastn to that directory too? I usually install things in /src but I don't think it's important since this container is not meant to stand alone. I can change that if needed. For 3) there is a process (line 443) that calls blast from a function (194) in that script.

If we expand this pipeline and require additional tools, we may just put them in this container, so that's why I kept this generic. But I can make it more specific, and maybe if we need additional functionality for PolkaPox we can add more containers within the same polkapox folder. Would that be ok?

jessicarowell avatar Feb 01 '24 21:02 jessicarowell

For 1) - what does it mean that deployed images are build to the app stage? Does this dockerfile not get pulled and then built by nextflow when it's called by a module? What parts are lost and how does that happen?

The dockerfile is built to the test stage only in github actions. Tests in bioinformatics often require fastq files, which are very large. We don't need the test files at run time, so each image is built to the app stage when it is hosted on dockerhub and quay. Anything after that (including your app2 and test) won't get built.

For 2) Thanks - I added the labels. I think I set the work dir to /data (at the end) - should I install blastn to that directory too? I usually install things in /src but I don't think it's important since this container is not meant to stand alone. I can change that if needed. For 3) there is a process (line 443) that calls blast from a function (194) in that script.

My apologies! blast is in the template dockerfile, and I assumed that it was in yours because you didn't feel like you could delete it. You can install blast wherever. /src/ is fine.

If we expand this pipeline and require additional tools, we may just put them in this container, so that's why I kept this generic. But I can make it more specific, and maybe if we need additional functionality for PolkaPox we can add more containers within the same polkapox folder. Would that be ok?

There are two ideals that run against each other.

  • The first ideal is to have each nextflow process have only the requisite environment it needs to run. This makes testing easier.
  • The second ideal is to have the fewest containers possible. This reduces the amount of space your workflow takes up for those users who use local resources.

I recommend you create a test for what you know you need. Then I recommend you add additional tests for when you add functionality later.

For example, we added plugins to the bcftools image in version 1.19 (https://github.com/StaPH-B/docker-builds/blob/master/bcftools/1.19/Dockerfile), so we added a test to check for the bcftools plugins.

Lastly, this repository is setup with the structure of tool/version/Dockerfile. You would need to create more directories for more dockerfiles/images. This simplifies things for the admins of this repo. We don't have a limit on the number of images we can host for you, but if there are going to be more than one I recommend they start with the same thing so they're easy to find.

A general guide (that you can use for inspiration, but I don't actually recommend):

  • polkapox/version/Dockerfile
  • polkapox-something-else/version/Dockerfile
  • polkapox-use-case2/version/Dockerfile

erinyoung avatar Feb 01 '24 22:02 erinyoung

Sorry - I forgot the maintainer labels yesterday. I've added those!

jessicarowell avatar Feb 02 '24 14:02 jessicarowell

Hi team! Are there any outstanding requirements for adding polkapox to docker-builds? Thanks for your help and thanks for maintaining this repo!

jessicarowell avatar Feb 06 '24 14:02 jessicarowell

There aren't really any issues. Did you want to include a test for your workflow? I recommend it.

Is it alright if I remove the Docker.io file?

erinyoung avatar Feb 06 '24 16:02 erinyoung

@jessicarowell , did you want to meet some time this week? I'd like to get this deployed for you.

erinyoung avatar Feb 12 '24 19:02 erinyoung

@jessicarowell , do you have any questions? I'd hate to hold up you and your team's workflow development.

erinyoung avatar Feb 27 '24 17:02 erinyoung

Hi Erin - I added test data and re-ran the workflow actions. Should be good to go now!
(Sorry this took so long. I had lost special access to build containers at CDC and it was a chore to get it back in order to add the test data.)

jessicarowell avatar Mar 25 '24 16:03 jessicarowell

Oh I also just realized I added an update to the tostadas container, which I'm also maintaining, and that made it's way into this PR. Do you want me to remove that Dockerfile and README and submit them separately, or change the title of this PR and leave them here? I just ran the workflow test on that one and it passed, so I can do whichever you prefer.

The tostadas update just contains 3 additional py libraries we need now.

jessicarowell avatar Mar 25 '24 17:03 jessicarowell

Your tests look great! In later Dockerfiles, you should be able to call these with wget (similar to freyja).

As for the tostadas Dockerfile, I'm going to admit that I get confused easily, so I recommend splitting the PRs.

erinyoung avatar Mar 25 '24 19:03 erinyoung