feat: add mehari wrappers
This PR adds wrappers for mehari:
annotate-seqvarsfor annotating VCF/VCF.gz/BCF (sequence variants, SNVs, indels, excluding structural variants)download-transcript-dbfor downloading transcript databases required for annotation of SO terms / consequencesdownload-clinvar-dbfor downloading clinvar databases required for annotation of clinvar VCVs
Right now there's no corresponding download-frequencies-db for gnomAD, as there's no publicly hosted version of those available yet.
There is also no annotate-strucvars variant yet, this will likely be added later in a different PR.
TODO:
- [x] Test for download-clinvar-db
- [x] Subset transcript db and reference fasta for proper but small enough
annotate-seqvarstest
Btw: Does this repository make use of git lfs yet?
QC
- [x] I confirm that I have followed the documentation for contributing to
snakemake-wrappers.
While the contributions guidelines are more extensive, please particularly ensure that:
- [x]
test.pywas updated to call any added or updated example rules in aSnakefile - [x]
input:andoutput:file paths in the rules can be chosen arbitrarily - [x] wherever possible, command line arguments are inferred and set automatically (e.g. based on file extensions in
input:oroutput:) - [x] temporary files are either written to a unique hidden folder in the working directory, or (better) stored where the Python function
tempfile.gettempdir()points to - [x] the
meta.yamlcontains a link to the documentation of the respective tool or command underurl: - [x] conda environments use a minimal amount of channels and packages, in recommended ordering
Summary by CodeRabbit
- New Features
- Added tools and workflows for downloading and managing Mehari transcript and ClinVar databases.
- Introduced a workflow for annotating sequence variants using Mehari, including sample data and configuration files.
- Tests
- Added tests to verify the correct operation of Mehari database downloads and variant annotation workflows.
- Documentation
- Included metadata files describing tool inputs, outputs, parameters, and authorship for workflow integration.
- Chores
- Added Conda environment files for reproducible setup of all new tools and workflows.
📝 Walkthrough
Walkthrough
This change adds new Snakemake wrappers, environment specifications, metadata files, test data, and test cases for the Mehari bioinformatics toolkit. It introduces tools for downloading transcript and ClinVar databases and for annotating sequence variants, enabling reproducible workflows and automated testing of Mehari-based processes.
Changes
| File(s) | Change Summary |
|---|---|
| bio/mehari/annotate-seqvars/environment.linux-64.pin.txt bio/mehari/annotate-seqvars/environment.yaml |
Added Conda environment specification and configuration files for Mehari annotate-seqvars, listing exact package versions and channels. |
| bio/mehari/annotate-seqvars/meta.yaml | Added metadata file describing the Mehari annotate-seqvars tool with inputs, outputs, parameters, and author. |
| bio/mehari/annotate-seqvars/wrapper.py | Added wrapper script that builds and executes the Mehari annotation command based on Snakemake inputs and parameters, including validation and conditional options. |
| bio/mehari/annotate-seqvars/test/Snakefile | Added Snakemake rule for running Mehari annotate-seqvars on test data, specifying inputs, outputs, parameters, logging, and wrapper usage. |
| bio/mehari/annotate-seqvars/test/resources/MT-ND2.vcf bio/mehari/annotate-seqvars/test/resources/MT.fasta bio/mehari/annotate-seqvars/test/resources/MT.fasta.fai |
Added test resource files: mitochondrial variant VCF, corresponding FASTA reference, and its index. |
| bio/mehari/download-clinvar-db/environment.linux-64.pin.txt bio/mehari/download-clinvar-db/environment.yaml |
Added Conda environment specification and configuration files for Mehari ClinVar DB download process. |
| bio/mehari/download-clinvar-db/meta.yaml | Added metadata file for the Mehari ClinVar DB download process with parameters and outputs. |
| bio/mehari/download-clinvar-db/wrapper.py | Added wrapper script to download and extract a ClinVar database release based on parameters, including validation and logging. |
| bio/mehari/download-clinvar-db/test/Snakefile | Added Snakemake rule for downloading the Mehari ClinVar DB (SV flavour) with parameters, output directory, logging, and wrapper reference. |
| bio/mehari/download-transcript-db/environment.linux-64.pin.txt bio/mehari/download-transcript-db/environment.yaml |
Added Conda environment specification and configuration files for Mehari transcript DB download process. |
| bio/mehari/download-transcript-db/meta.yaml | Added metadata file for the Mehari transcript DB download tool with parameters and outputs. |
| bio/mehari/download-transcript-db/wrapper.py | Added wrapper script to download a transcript database binary file from a GitHub release based on parameters, with validation and logging. |
| bio/mehari/download-transcript-db/test/Snakefile | Added Snakemake rule for downloading the Mehari transcript DB with parameters, output file, logging, caching, and wrapper usage. |
| test_wrappers.py | Added three test functions to run and verify the Mehari transcript DB download, ClinVar DB download, and annotate-seqvars wrappers using Snakemake with specified targets and options. |
Sequence Diagram(s)
Mehari Annotate Seqvars Workflow
sequenceDiagram
participant Snakemake
participant Wrapper (annotate-seqvars)
participant Mehari CLI
Snakemake->>Wrapper (annotate-seqvars): Provide input files, parameters, and output path
Wrapper (annotate-seqvars)->>Wrapper (annotate-seqvars): Build Mehari command with inputs and options
Wrapper (annotate-seqvars)->>Mehari CLI: Execute command for annotation
Mehari CLI-->>Wrapper (annotate-seqvars): Annotated VCF/BCF file
Wrapper (annotate-seqvars)-->>Snakemake: Output file and logs
Mehari Download Transcript DB Workflow
sequenceDiagram
participant Snakemake
participant Wrapper (download-transcript-db)
participant GitHub Releases
Snakemake->>Wrapper (download-transcript-db): Provide version, build, source parameters, and output path
Wrapper (download-transcript-db)->>GitHub Releases: Download transcript DB binary (.bin.zst)
GitHub Releases-->>Wrapper (download-transcript-db): Transcript DB file
Wrapper (download-transcript-db)-->>Snakemake: Output file and logs
Mehari Download ClinVar DB Workflow
sequenceDiagram
participant Snakemake
participant Wrapper (download-clinvar-db)
participant GitHub Releases
Snakemake->>Wrapper (download-clinvar-db): Provide version, build, flavour parameters, and output directory
Wrapper (download-clinvar-db)->>GitHub Releases: Download ClinVar DB tarball
GitHub Releases-->>Wrapper (download-clinvar-db): ClinVar DB tarball
Wrapper (download-clinvar-db)->>Wrapper (download-clinvar-db): Extract tarball to output directory
Wrapper (download-clinvar-db)-->>Snakemake: Output directory and logs
Suggested reviewers
- fgvieira
✨ Finishing Touches
- [ ] 📝 Generate Docstrings
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.
🪧 Tips
Chat
There are 3 ways to chat with CodeRabbit:
- Review comments: Directly reply to a review comment made by CodeRabbit. Example:
I pushed a fix in commit <commit_id>, please review it.Explain this complex logic.Open a follow-up GitHub issue for this discussion.
- Files and specific lines of code (under the "Files changed" tab): Tag
@coderabbitaiin a new review comment at the desired location with your query. Examples:@coderabbitai explain this code block.@coderabbitai modularize this function.
- PR comments: Tag
@coderabbitaiin a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:@coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.@coderabbitai read src/utils.ts and explain its main purpose.@coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.@coderabbitai help me debug CodeRabbit configuration file.
Support
Need help? Create a ticket on our support page for assistance with any issues or questions.
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.
CodeRabbit Commands (Invoked using PR comments)
@coderabbitai pauseto pause the reviews on a PR.@coderabbitai resumeto resume the paused reviews.@coderabbitai reviewto trigger an incremental review. This is useful when automatic reviews are disabled for the repository.@coderabbitai full reviewto do a full review from scratch and review all the files again.@coderabbitai summaryto regenerate the summary of the PR.@coderabbitai generate docstringsto generate docstrings for this PR.@coderabbitai generate sequence diagramto generate a sequence diagram of the changes in this PR.@coderabbitai resolveresolve all the CodeRabbit review comments.@coderabbitai configurationto show the current CodeRabbit configuration for the repository.@coderabbitai helpto get help.
Other keywords and placeholders
- Add
@coderabbitai ignoreanywhere in the PR description to prevent this PR from being reviewed. - Add
@coderabbitai summaryto generate the high-level summary at a specific location in the PR description. - Add
@coderabbitaianywhere in the PR title to generate the title automatically.
CodeRabbit Configuration File (.coderabbit.yaml)
- You can programmatically configure CodeRabbit by adding a
.coderabbit.yamlfile to the root of your repository. - Please see the configuration documentation for more information.
- If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation:
# yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json
Documentation and Community
- Visit our Documentation for detailed information on how to use CodeRabbit.
- Join our Discord Community to get help, request features, and share feedback.
- Follow us on X/Twitter for updates and announcements.
~TODO: update conda pin once mehari 0.36.1 is available from bioconda~ done
Thanks for your contributions! :+1: