snakemake-wrappers
snakemake-wrappers copied to clipboard
feat: bowtie2 can now sort with samtools and picard
This PR shamelessly copies functionalities from bwa-mem2 to sort or not, do it by coordinate or queryname, and choose between samtools and picard to do so
QC
- [x] I confirm that I have followed the documentation for contributing to
snakemake-wrappers.
While the contributions guidelines are more extensive, please particularly ensure that:
- [x]
test.pywas updated to call any added or updated example rules in aSnakefile - [x]
input:andoutput:file paths in the rules can be chosen arbitrarily - [x] wherever possible, command line arguments are inferred and set automatically (e.g. based on file extensions in
input:oroutput:) - [x] temporary files are either written to a unique hidden folder in the working directory, or (better) stored where the Python function
tempfile.gettempdir()points to - [x] the
meta.yamlcontains a link to the documentation of the respective tool or command underurl: - [x] conda environments use a minimal amount of channels and packages, in recommended ordering
Summary by CodeRabbit
Release Notes
-
New Features
- Enhanced configurability for the Bowtie2 alignment tool with new sorting parameters.
- Added support for additional font packages to improve visual output.
- Introduced dynamic rule generation for various alignment configurations in the testing framework.
- Updated dependencies for Bowtie2 and added new packages for improved functionality, including graphics and font libraries.
- Added a new author to the Bowtie2 package and clarified functionality in the documentation.
- New entry added to the genome index file for improved sequence tracking.
-
Bug Fixes
- Improved error handling for missing Bowtie2 index files and refined input validation.
-
Tests
- Expanded test coverage for Bowtie2 alignment and Sourmash tools to ensure robustness across multiple configurations.
- Added new test functions to validate command parameters and output paths for Bowtie2 alignments.
📝 Walkthrough
Walkthrough
This pull request introduces significant updates to the Bowtie2 alignment tool's environment configuration and functionality. Key changes include modifications to the Conda environment files (environment.linux-64.pin.txt, environment.yaml), enhancements in the meta.yaml file for parameter configurability, and the addition of dynamic rules in the Snakefile for alignment processing. Furthermore, the wrapper.py file has been updated to improve error handling and command execution. New tests have been added to test_wrappers.py to ensure comprehensive coverage of the new features and configurations.
Changes
| File Path | Change Summary |
|---|---|
| bio/bowtie2/align/environment.linux-64.pin.txt | - Added new Conda version header. - Removed several package URLs. - Added multiple new packages including fonts and libraries. - Updated existing package versions. |
| bio/bowtie2/align/environment.yaml | - Updated bowtie2 version from 2.5.4 to 2.5.- Added dependency: picard-slim =3.3.- Updated snakemake-wrapper-utils from 0.6.2 to 0.6. |
| bio/bowtie2/align/meta.yaml | - Added author: Jorge Langa.- Introduced parameters: sort_program, sort_extra, sort_order.- Updated notes section. |
| bio/bowtie2/align/test/Snakefile | - Introduced dynamic rules for alignment based on sorting programs, orders, and file extensions. |
| bio/bowtie2/align/wrapper.py | - Added error handling for missing Bowtie2 index files. - Refined input sample validation. - Expanded sorting program options. - Updated command construction and logging. |
| test_wrappers.py | - Added test function: test_bowtie2_align.- Added test function: test_sourmash_compute.- Minor formatting adjustments in existing tests. |
Possibly related PRs
- [#3101] Fixes handling of index files in the
bwa-mem2wrapper, which may relate to similar index management in the main PR's Conda environment updates. - [#3371] Updates the
hisat2 aligntool to include index file handling, which aligns with the main PR's focus on managing dependencies and environment configurations. - [#3500] Introduces a wrapper for the
ngs-bits SampleSimilaritytool, which may involve similar environment management practices as seen in the main PR's updates to the Conda environment files.
Suggested reviewers
- johanneskoester
- fgvieira
[!TIP] CodeRabbit's docstrings feature is now available as part of our Early Access Program! Simply use the command
@coderabbitai generate docstringsto have CodeRabbit automatically generate docstrings for your pull request. We would love to hear your feedback on Discord.
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?
🪧 Tips
Chat
There are 3 ways to chat with CodeRabbit:
- Review comments: Directly reply to a review comment made by CodeRabbit. Example:
I pushed a fix in commit <commit_id>, please review it.Generate unit testing code for this file.Open a follow-up GitHub issue for this discussion.
- Files and specific lines of code (under the "Files changed" tab): Tag
@coderabbitaiin a new review comment at the desired location with your query. Examples:@coderabbitai generate unit testing code for this file.@coderabbitai modularize this function.
- PR comments: Tag
@coderabbitaiin a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:@coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.@coderabbitai read src/utils.ts and generate unit testing code.@coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.@coderabbitai help me debug CodeRabbit configuration file.
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.
CodeRabbit Commands (Invoked using PR comments)
@coderabbitai pauseto pause the reviews on a PR.@coderabbitai resumeto resume the paused reviews.@coderabbitai reviewto trigger an incremental review. This is useful when automatic reviews are disabled for the repository.@coderabbitai full reviewto do a full review from scratch and review all the files again.@coderabbitai summaryto regenerate the summary of the PR.@coderabbitai generate docstringsto generate docstrings for this PR. (Beta)@coderabbitai resolveresolve all the CodeRabbit review comments.@coderabbitai configurationto show the current CodeRabbit configuration for the repository.@coderabbitai helpto get help.
Other keywords and placeholders
- Add
@coderabbitai ignoreanywhere in the PR description to prevent this PR from being reviewed. - Add
@coderabbitai summaryto generate the high-level summary at a specific location in the PR description. - Add
@coderabbitaianywhere in the PR title to generate the title automatically.
CodeRabbit Configuration File (.coderabbit.yaml)
- You can programmatically configure CodeRabbit by adding a
.coderabbit.yamlfile to the root of your repository. - Please see the configuration documentation for more information.
- If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation:
# yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json
Documentation and Community
- Visit our Documentation for detailed information on how to use CodeRabbit.
- Join our Discord Community to get help, request features, and share feedback.
- Follow us on X/Twitter for updates and announcements.
I'll continue next week
@fgvieira I think I am done with this wrapper.
I've left commented the part about outputting extra files from bowtie2 (metrics, unaligned, unpaired, concordant and unconcordant). Some of them introduce a lot of complexity in the code because bowtie can output 1 or 2 fastq files depending on if the input is SE or PE. These files can be easily obtained with samtools view and samtools fastq. Also if you want gzipped files you have to use different flags (--un-conc becomes --un-conc-gz or --un-conc-bz2). It is a headache.
Let me know if you want me to keep that functionality on to uncomment it.
It seems you still have un-pushed commits. Can you push them?
I don't understand. All my commits are on the bowtie2_sort branch. I have nothing else pending to push in my computer.
I clicked the Update Branch button. I hope it was that.
On the environment.yaml file, the patch versions are not pinned (see my comments).
I see that we both removed the patch number, but in a comment later you appended the one for picard-slim:
- picard-slim =3.3.0
Is that correct? That one has to be fixed to 3.3.0?
Ok. I see other wrappers with picard-slim =3.3.0. Fixing that and repinning everything.
but also bowtie2 and wrapper utils
Ok. Pinned everything with the exception of samtools. The pin file stays the same after that.
Why the total re-write of the wrapper? As I see it, you only needed to add the logic for picard, like (for example) in the bwa wrapper:
https://github.com/snakemake/snakemake-wrappers/blob/7b6aa5ce164a4f1c45716ae9c56dbe6d330c08f3/bio/bwa/mem/wrapper.py#L43-L66
and change the line: https://github.com/snakemake/snakemake-wrappers/blob/7b6aa5ce164a4f1c45716ae9c56dbe6d330c08f3/bio/bowtie2/align/wrapper.py#L97
My initial plan was to just do either samtools view or sort, but I saw the wrapper from bwa-mem2 and I saw that it could be done for sam, bam and cram, sorted and unsorted, and with samtools and picard.
But bwa-mem2 only aligns and generates the sam file. Bowtie2 can produce extra outputs.
And now you brought to my attention fgbio SortSam.
And there are wrappers specific to a single sort program like bwa-mem2 and samblaster
What is the official view from snakemake wrappers? A monolithic wrapper htat handles multiple tool combinations, or multiple silly wrappers?
If we go through the silly wrappers, we can delete 100+ lines of code just for validation.
If it is just different sorting programs, I'd say a single wrapper. @johanneskoester what do you think?
I would view this from the user perspective. Since it is so common to sort after alignment, it is nice to have a reasonable default for that direcly built into a wrapper. But we don't need to support all combinations. Just the current state of the art (i.e. the fastest sorter) that supports all relevant output files (bam and cram).
This PR was marked as stale because it has been open for 6 months with no activity.