snakemake-wrappers icon indicating copy to clipboard operation
snakemake-wrappers copied to clipboard

feat: Add mmseqs2 main workflows

Open fgvieira opened this issue 7 months ago โ€ข 1 comments
trafficstars

QC

While the contributions guidelines are more extensive, please particularly ensure that:

  • [x] test.py was updated to call any added or updated example rules in a Snakefile
  • [x] input: and output: file paths in the rules can be chosen arbitrarily
  • [x] wherever possible, command line arguments are inferred and set automatically (e.g. based on file extensions in input: or output:)
  • [x] temporary files are either written to a unique hidden folder in the working directory, or (better) stored where the Python function tempfile.gettempdir() points to
  • [x] the meta.yaml contains a link to the documentation of the respective tool or command under url:
  • [x] conda environments use a minimal amount of channels and packages, in recommended ordering

fgvieira avatar Apr 04 '25 13:04 fgvieira

๐Ÿ“ Walkthrough

Walkthrough

Adds MMseqs2 Snakemake wrappers, workflow and DB metadata, Conda environment specs (YAML and linux-64 pin files), test Snakefiles, a test runner, and numerous static test fixtures/expected outputs for DB creation, search, clustering, linclust, taxonomy, and RBH.

Changes

Cohort / File(s) Summary
Conda environments
bio/mmseqs2/db/environment.yaml, bio/mmseqs2/db/environment.linux-64.pin.txt, bio/mmseqs2/workflows/environment.yaml, bio/mmseqs2/workflows/environment.linux-64.pin.txt
New Conda environment YAMLs and explicit linux-64 pin files listing exact package URLs (conda-forge, bioconda), pinning mmseqs2, snakemake-wrapper-utils and runtime libraries.
Metadata / Manifests
bio/mmseqs2/db/meta.yaml, bio/mmseqs2/workflows/meta.yaml
New metadata files declaring name, url, description, authors, I/O schema and params (module, extra).
Wrappers (runtime wrappers)
bio/mmseqs2/db/wrapper.py, bio/mmseqs2/workflows/wrapper.py
New Snakemake wrapper modules that normalize inputs/outputs, assemble command-line args (module, extra, threads, tmpdir), special-case DB modules, and execute mmseqs2 via shell; workflow wrapper includes module-level metadata.
DB tests โ€” rules & input
bio/mmseqs2/db/test/Snakefile, bio/mmseqs2/db/test/seqs/a.fasta
New test Snakefile with rules mmseqs2_databases and mmseqs2_createdb, plus a small FASTA used as input.
DB tests โ€” expected (createdb/databases)
bio/mmseqs2/db/test/expected/createdb/*, bio/mmseqs2/db/test/expected/databases/*
Static expected outputs for createdb/databases (index, lookup, source, version/README, _h.* files).
Workflow rules (tests)
bio/mmseqs2/workflows/test/Snakefile
New workflow test rules covering search, cluster, linclust, taxonomy, and rbh with multiext outputs, logs, params and wrapper references.
Workflow DB fixtures
bio/mmseqs2/workflows/test/db/*
Static DB fixture files (a.index, a.lookup, a.source, a_h.index, a_mapping).
Workflow expected outputs
bio/mmseqs2/workflows/test/expected/cluster/*, bio/mmseqs2/workflows/test/expected/linclust/*, bio/mmseqs2/workflows/test/expected/search/a.tab, bio/mmseqs2/workflows/test/expected/rbh/a.tab, bio/mmseqs2/workflows/test/expected/taxonomy/a_report
Expected FASTA files, representative sequences, alignment/tab outputs and a taxonomy report used by workflow tests.
Top-level tests
test_wrappers.py
New test test_mmseqs2(run) that runs both the workflows and db test suites and compares results with expected fixtures.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Rule as Snakemake Rule
  participant Wrapper as MMseqs2 Wrapper
  participant MM as mmseqs2 CLI
  participant FS as Filesystem

  Rule->>Wrapper: invoke(inputs, params(module, extra), threads, log)
  Note over Wrapper: normalize inputs/outputs\nresolve common prefixes\nconfigure tmpdir/threads/extra
  Wrapper->>MM: mmseqs2 <module> <query> <target?> <output> --threads N <extra> (uses tmpdir)
  MM->>FS: read input files
  MM-->>FS: write outputs (DBs, tabs, FASTA, reports)
  MM-->>Wrapper: exit status
  Wrapper-->>Rule: write log, expose outputs
sequenceDiagram
  autonumber
  participant RuleDB as Snakemake DB Rule
  participant DBWrapper as MMseqs2 DB Wrapper
  participant MM as mmseqs2 CLI
  participant FS as Filesystem

  RuleDB->>DBWrapper: invoke(seqs input, params, threads, log)
  Note over DBWrapper: special-case modules:\n- databases: append thread flags\n- createdb: disable tmpdir
  DBWrapper->>MM: mmseqs2 <module> <in> <out> [--threads N] <extra>
  MM->>FS: read seqs
  MM-->>FS: emit DB artifacts (.index, .lookup, .source, _h.*)
  MM-->>DBWrapper: exit status
  DBWrapper-->>RuleDB: log and outputs

Estimated code review effort

๐ŸŽฏ 3 (Moderate) | โฑ๏ธ ~25 minutes

  • Review focus suggestions:
    • bio/mmseqs2/db/wrapper.py and bio/mmseqs2/workflows/wrapper.py (input/output normalization, tmpdir handling, thread flags)
    • Snakefiles in bio/mmseqs2/db/test/ and bio/mmseqs2/workflows/test/ (correct multiext outputs and log paths)
    • test_wrappers.py (test invocation and expected-vs-actual comparisons)

Suggested reviewers

  • johanneskoester

Pre-merge checks and finishing touches

โŒ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage โš ๏ธ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Description Check โ“ Inconclusive The PR description includes a fully completed QC checklist with all eight items properly checked, confirming the author's adherence to the snakemake-wrappers contributing documentation and verification of key requirements such as test.py updates, arbitrary I/O paths, automatic argument inference, proper temporary file handling, meta.yaml documentation links, and minimal conda environments. However, the description section itself (marked "Add a description of your PR here") is entirely empty with no explanatory text about what mmseqs2 workflows are being added or what the changes accomplish, leaving this critical section of the template unfilled despite the comprehensive QC checklist completion. To resolve this, please add a descriptive paragraph explaining what mmseqs2 workflows have been added to the repository, which modules are included (e.g., easy-search, easy-cluster, easy-linclust, easy-taxonomy, easy-rbh), and a brief summary of the implementation approach. The QC checklist items are properly verified, but the descriptive text section should be filled out to provide reviewers with context about the changes beyond just the compliance checklist.
โœ… Passed checks (1 passed)
Check name Status Explanation
Title Check โœ… Passed The PR title "feat: Add mmseqs2 main workflows" follows conventional commit style with the "feat:" prefix and clearly summarizes the main change in the changeset. The title accurately reflects the primary objective of adding mmseqs2 workflows to the repository, as evidenced by the extensive workflow additions including new wrapper modules, test files, metadata configurations, and five new Snakemake rules for workflow operations (search, cluster, linclust, taxonomy, and rbh). The title is concise and specific enough that a teammate reviewing the history would understand the main contribution.
โœจ Finishing touches
  • [ ] ๐Ÿ“ Generate docstrings
๐Ÿงช Generate unit tests (beta)
  • [ ] Create PR with unit tests
  • [ ] Post copyable unit tests in a comment

๐Ÿ“œ Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

๐Ÿ“ฅ Commits

Reviewing files that changed from the base of the PR and between 949d65dccb28cd900849719d45f3b6226f9c4f91 and 1c072850c87e188309f9f729d037037e5a834e0a.

๐Ÿ“’ Files selected for processing (1)
  • test_wrappers.py (1 hunks)
๐Ÿšง Files skipped from review as they are similar to previous changes (1)
  • test_wrappers.py
โฐ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: docs
  • GitHub Check: testing

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

โค๏ธ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

coderabbitai[bot] avatar Apr 04 '25 13:04 coderabbitai[bot]