amazon-genomics-cli icon indicating copy to clipboard operation
amazon-genomics-cli copied to clipboard

User-specified dependencies in runtime environment

Open bballew opened this issue 1 year ago • 0 comments

Description

It would be great if there was a way to have the user specify minimal dependencies to be installed in the runtime environment.

Use Case

Specifically for Snakemake, a very common paradigm is to use pandas to read in a tab-separated manifest file and then query the dataframe with lambda functions in rules. In fact, pandas is part of the dependency chain for installing Snakemake via conda. It appears that we don't have access to pandas in the Snakemake container being used by agc. Since Snakemake is based on Python, I could see users wanting access to other Python libraries at runtime, that might not currently be installed in the Snakemake container.

Proposed Solution

  • For my specific situation, adding pandas to the dockerfile and rebuilding would do the trick.
  • For a more generalizable solution, if conda and/or pip is present in the docker container, perhaps we could write out a specification for a runtime environment in a requirements.txt or environment.yaml file. Maybe this file could be referenced in the MANIFEST.json file.
  • Alternatively, I wonder whether we could allow users to specify an alternate container.
  • Finally, if the container is being built from a recipe in this repo, would you consider accepting PRs to judiciously add dependencies?

Thanks!

bballew avatar Jun 14 '23 13:06 bballew