snakemake-wrappers icon indicating copy to clipboard operation
snakemake-wrappers copied to clipboard

feat: add mehari wrappers

Open tedil opened this issue 6 months ago • 1 comments

This PR adds wrappers for mehari:

  • annotate-seqvars for annotating VCF/VCF.gz/BCF (sequence variants, SNVs, indels, excluding structural variants)
  • download-transcript-db for downloading transcript databases required for annotation of SO terms / consequences
  • download-clinvar-db for downloading clinvar databases required for annotation of clinvar VCVs

Right now there's no corresponding download-frequencies-db for gnomAD, as there's no publicly hosted version of those available yet. There is also no annotate-strucvars variant yet, this will likely be added later in a different PR.

TODO:

  • [x] Test for download-clinvar-db
  • [x] Subset transcript db and reference fasta for proper but small enough annotate-seqvars test

Btw: Does this repository make use of git lfs yet?

QC

While the contributions guidelines are more extensive, please particularly ensure that:

  • [x] test.py was updated to call any added or updated example rules in a Snakefile
  • [x] input: and output: file paths in the rules can be chosen arbitrarily
  • [x] wherever possible, command line arguments are inferred and set automatically (e.g. based on file extensions in input: or output:)
  • [x] temporary files are either written to a unique hidden folder in the working directory, or (better) stored where the Python function tempfile.gettempdir() points to
  • [x] the meta.yaml contains a link to the documentation of the respective tool or command under url:
  • [x] conda environments use a minimal amount of channels and packages, in recommended ordering

Summary by CodeRabbit

  • New Features
    • Added tools and workflows for downloading and managing Mehari transcript and ClinVar databases.
    • Introduced a workflow for annotating sequence variants using Mehari, including sample data and configuration files.
  • Tests
    • Added tests to verify the correct operation of Mehari database downloads and variant annotation workflows.
  • Documentation
    • Included metadata files describing tool inputs, outputs, parameters, and authorship for workflow integration.
  • Chores
    • Added Conda environment files for reproducible setup of all new tools and workflows.

tedil avatar Jun 13 '25 11:06 tedil

📝 Walkthrough

Walkthrough

This change adds new Snakemake wrappers, environment specifications, metadata files, test data, and test cases for the Mehari bioinformatics toolkit. It introduces tools for downloading transcript and ClinVar databases and for annotating sequence variants, enabling reproducible workflows and automated testing of Mehari-based processes.

Changes

File(s) Change Summary
bio/mehari/annotate-seqvars/environment.linux-64.pin.txt
bio/mehari/annotate-seqvars/environment.yaml
Added Conda environment specification and configuration files for Mehari annotate-seqvars, listing exact package versions and channels.
bio/mehari/annotate-seqvars/meta.yaml Added metadata file describing the Mehari annotate-seqvars tool with inputs, outputs, parameters, and author.
bio/mehari/annotate-seqvars/wrapper.py Added wrapper script that builds and executes the Mehari annotation command based on Snakemake inputs and parameters, including validation and conditional options.
bio/mehari/annotate-seqvars/test/Snakefile Added Snakemake rule for running Mehari annotate-seqvars on test data, specifying inputs, outputs, parameters, logging, and wrapper usage.
bio/mehari/annotate-seqvars/test/resources/MT-ND2.vcf
bio/mehari/annotate-seqvars/test/resources/MT.fasta
bio/mehari/annotate-seqvars/test/resources/MT.fasta.fai
Added test resource files: mitochondrial variant VCF, corresponding FASTA reference, and its index.
bio/mehari/download-clinvar-db/environment.linux-64.pin.txt
bio/mehari/download-clinvar-db/environment.yaml
Added Conda environment specification and configuration files for Mehari ClinVar DB download process.
bio/mehari/download-clinvar-db/meta.yaml Added metadata file for the Mehari ClinVar DB download process with parameters and outputs.
bio/mehari/download-clinvar-db/wrapper.py Added wrapper script to download and extract a ClinVar database release based on parameters, including validation and logging.
bio/mehari/download-clinvar-db/test/Snakefile Added Snakemake rule for downloading the Mehari ClinVar DB (SV flavour) with parameters, output directory, logging, and wrapper reference.
bio/mehari/download-transcript-db/environment.linux-64.pin.txt
bio/mehari/download-transcript-db/environment.yaml
Added Conda environment specification and configuration files for Mehari transcript DB download process.
bio/mehari/download-transcript-db/meta.yaml Added metadata file for the Mehari transcript DB download tool with parameters and outputs.
bio/mehari/download-transcript-db/wrapper.py Added wrapper script to download a transcript database binary file from a GitHub release based on parameters, with validation and logging.
bio/mehari/download-transcript-db/test/Snakefile Added Snakemake rule for downloading the Mehari transcript DB with parameters, output file, logging, caching, and wrapper usage.
test_wrappers.py Added three test functions to run and verify the Mehari transcript DB download, ClinVar DB download, and annotate-seqvars wrappers using Snakemake with specified targets and options.

Sequence Diagram(s)

Mehari Annotate Seqvars Workflow

sequenceDiagram
    participant Snakemake
    participant Wrapper (annotate-seqvars)
    participant Mehari CLI

    Snakemake->>Wrapper (annotate-seqvars): Provide input files, parameters, and output path
    Wrapper (annotate-seqvars)->>Wrapper (annotate-seqvars): Build Mehari command with inputs and options
    Wrapper (annotate-seqvars)->>Mehari CLI: Execute command for annotation
    Mehari CLI-->>Wrapper (annotate-seqvars): Annotated VCF/BCF file
    Wrapper (annotate-seqvars)-->>Snakemake: Output file and logs

Mehari Download Transcript DB Workflow

sequenceDiagram
    participant Snakemake
    participant Wrapper (download-transcript-db)
    participant GitHub Releases

    Snakemake->>Wrapper (download-transcript-db): Provide version, build, source parameters, and output path
    Wrapper (download-transcript-db)->>GitHub Releases: Download transcript DB binary (.bin.zst)
    GitHub Releases-->>Wrapper (download-transcript-db): Transcript DB file
    Wrapper (download-transcript-db)-->>Snakemake: Output file and logs

Mehari Download ClinVar DB Workflow

sequenceDiagram
    participant Snakemake
    participant Wrapper (download-clinvar-db)
    participant GitHub Releases

    Snakemake->>Wrapper (download-clinvar-db): Provide version, build, flavour parameters, and output directory
    Wrapper (download-clinvar-db)->>GitHub Releases: Download ClinVar DB tarball
    GitHub Releases-->>Wrapper (download-clinvar-db): ClinVar DB tarball
    Wrapper (download-clinvar-db)->>Wrapper (download-clinvar-db): Extract tarball to output directory
    Wrapper (download-clinvar-db)-->>Snakemake: Output directory and logs

Suggested reviewers

  • fgvieira
✨ Finishing Touches
  • [ ] 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

coderabbitai[bot] avatar Jun 13 '25 11:06 coderabbitai[bot]

~TODO: update conda pin once mehari 0.36.1 is available from bioconda~ done

tedil avatar Jul 01 '25 11:07 tedil

Thanks for your contributions! :+1:

fgvieira avatar Jul 02 '25 09:07 fgvieira