snakemake-wrappers
snakemake-wrappers copied to clipboard
feat: Add mmseqs2 main workflows
QC
- [x] I confirm that I have followed the documentation for contributing to
snakemake-wrappers.
While the contributions guidelines are more extensive, please particularly ensure that:
- [x]
test.pywas updated to call any added or updated example rules in aSnakefile - [x]
input:andoutput:file paths in the rules can be chosen arbitrarily - [x] wherever possible, command line arguments are inferred and set automatically (e.g. based on file extensions in
input:oroutput:) - [x] temporary files are either written to a unique hidden folder in the working directory, or (better) stored where the Python function
tempfile.gettempdir()points to - [x] the
meta.yamlcontains a link to the documentation of the respective tool or command underurl: - [x] conda environments use a minimal amount of channels and packages, in recommended ordering
๐ Walkthrough
Walkthrough
Adds MMseqs2 Snakemake wrappers, workflow and DB metadata, Conda environment specs (YAML and linux-64 pin files), test Snakefiles, a test runner, and numerous static test fixtures/expected outputs for DB creation, search, clustering, linclust, taxonomy, and RBH.
Changes
| Cohort / File(s) | Summary |
|---|---|
Conda environmentsbio/mmseqs2/db/environment.yaml, bio/mmseqs2/db/environment.linux-64.pin.txt, bio/mmseqs2/workflows/environment.yaml, bio/mmseqs2/workflows/environment.linux-64.pin.txt |
New Conda environment YAMLs and explicit linux-64 pin files listing exact package URLs (conda-forge, bioconda), pinning mmseqs2, snakemake-wrapper-utils and runtime libraries. |
Metadata / Manifestsbio/mmseqs2/db/meta.yaml, bio/mmseqs2/workflows/meta.yaml |
New metadata files declaring name, url, description, authors, I/O schema and params (module, extra). |
Wrappers (runtime wrappers)bio/mmseqs2/db/wrapper.py, bio/mmseqs2/workflows/wrapper.py |
New Snakemake wrapper modules that normalize inputs/outputs, assemble command-line args (module, extra, threads, tmpdir), special-case DB modules, and execute mmseqs2 via shell; workflow wrapper includes module-level metadata. |
DB tests โ rules & inputbio/mmseqs2/db/test/Snakefile, bio/mmseqs2/db/test/seqs/a.fasta |
New test Snakefile with rules mmseqs2_databases and mmseqs2_createdb, plus a small FASTA used as input. |
DB tests โ expected (createdb/databases)bio/mmseqs2/db/test/expected/createdb/*, bio/mmseqs2/db/test/expected/databases/* |
Static expected outputs for createdb/databases (index, lookup, source, version/README, _h.* files). |
Workflow rules (tests)bio/mmseqs2/workflows/test/Snakefile |
New workflow test rules covering search, cluster, linclust, taxonomy, and rbh with multiext outputs, logs, params and wrapper references. |
Workflow DB fixturesbio/mmseqs2/workflows/test/db/* |
Static DB fixture files (a.index, a.lookup, a.source, a_h.index, a_mapping). |
Workflow expected outputsbio/mmseqs2/workflows/test/expected/cluster/*, bio/mmseqs2/workflows/test/expected/linclust/*, bio/mmseqs2/workflows/test/expected/search/a.tab, bio/mmseqs2/workflows/test/expected/rbh/a.tab, bio/mmseqs2/workflows/test/expected/taxonomy/a_report |
Expected FASTA files, representative sequences, alignment/tab outputs and a taxonomy report used by workflow tests. |
Top-level teststest_wrappers.py |
New test test_mmseqs2(run) that runs both the workflows and db test suites and compares results with expected fixtures. |
Sequence Diagram(s)
sequenceDiagram
autonumber
participant Rule as Snakemake Rule
participant Wrapper as MMseqs2 Wrapper
participant MM as mmseqs2 CLI
participant FS as Filesystem
Rule->>Wrapper: invoke(inputs, params(module, extra), threads, log)
Note over Wrapper: normalize inputs/outputs\nresolve common prefixes\nconfigure tmpdir/threads/extra
Wrapper->>MM: mmseqs2 <module> <query> <target?> <output> --threads N <extra> (uses tmpdir)
MM->>FS: read input files
MM-->>FS: write outputs (DBs, tabs, FASTA, reports)
MM-->>Wrapper: exit status
Wrapper-->>Rule: write log, expose outputs
sequenceDiagram
autonumber
participant RuleDB as Snakemake DB Rule
participant DBWrapper as MMseqs2 DB Wrapper
participant MM as mmseqs2 CLI
participant FS as Filesystem
RuleDB->>DBWrapper: invoke(seqs input, params, threads, log)
Note over DBWrapper: special-case modules:\n- databases: append thread flags\n- createdb: disable tmpdir
DBWrapper->>MM: mmseqs2 <module> <in> <out> [--threads N] <extra>
MM->>FS: read seqs
MM-->>FS: emit DB artifacts (.index, .lookup, .source, _h.*)
MM-->>DBWrapper: exit status
DBWrapper-->>RuleDB: log and outputs
Estimated code review effort
๐ฏ 3 (Moderate) | โฑ๏ธ ~25 minutes
- Review focus suggestions:
bio/mmseqs2/db/wrapper.pyandbio/mmseqs2/workflows/wrapper.py(input/output normalization, tmpdir handling, thread flags)- Snakefiles in
bio/mmseqs2/db/test/andbio/mmseqs2/workflows/test/(correct multiext outputs and log paths) test_wrappers.py(test invocation and expected-vs-actual comparisons)
Suggested reviewers
- johanneskoester
Pre-merge checks and finishing touches
โ Failed checks (1 warning, 1 inconclusive)
| Check name | Status | Explanation | Resolution |
|---|---|---|---|
| Docstring Coverage | โ ๏ธ Warning | Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. | You can run @coderabbitai generate docstrings to improve docstring coverage. |
| Description Check | โ Inconclusive | The PR description includes a fully completed QC checklist with all eight items properly checked, confirming the author's adherence to the snakemake-wrappers contributing documentation and verification of key requirements such as test.py updates, arbitrary I/O paths, automatic argument inference, proper temporary file handling, meta.yaml documentation links, and minimal conda environments. However, the description section itself (marked "Add a description of your PR here") is entirely empty with no explanatory text about what mmseqs2 workflows are being added or what the changes accomplish, leaving this critical section of the template unfilled despite the comprehensive QC checklist completion. | To resolve this, please add a descriptive paragraph explaining what mmseqs2 workflows have been added to the repository, which modules are included (e.g., easy-search, easy-cluster, easy-linclust, easy-taxonomy, easy-rbh), and a brief summary of the implementation approach. The QC checklist items are properly verified, but the descriptive text section should be filled out to provide reviewers with context about the changes beyond just the compliance checklist. |
โ Passed checks (1 passed)
| Check name | Status | Explanation |
|---|---|---|
| Title Check | โ Passed | The PR title "feat: Add mmseqs2 main workflows" follows conventional commit style with the "feat:" prefix and clearly summarizes the main change in the changeset. The title accurately reflects the primary objective of adding mmseqs2 workflows to the repository, as evidenced by the extensive workflow additions including new wrapper modules, test files, metadata configurations, and five new Snakemake rules for workflow operations (search, cluster, linclust, taxonomy, and rbh). The title is concise and specific enough that a teammate reviewing the history would understand the main contribution. |
โจ Finishing touches
- [ ] ๐ Generate docstrings
๐งช Generate unit tests (beta)
- [ ] Create PR with unit tests
- [ ] Post copyable unit tests in a comment
๐ Recent review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
๐ฅ Commits
Reviewing files that changed from the base of the PR and between 949d65dccb28cd900849719d45f3b6226f9c4f91 and 1c072850c87e188309f9f729d037037e5a834e0a.
๐ Files selected for processing (1)
test_wrappers.py(1 hunks)
๐ง Files skipped from review as they are similar to previous changes (1)
- test_wrappers.py
โฐ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: docs
- GitHub Check: testing
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.
Comment @coderabbitai help to get the list of available commands and usage tips.