snakemake-wrappers icon indicating copy to clipboard operation
snakemake-wrappers copied to clipboard

feat: bowtie2 can now sort with samtools and picard

Open jlanga opened this issue 11 months ago • 15 comments
trafficstars

This PR shamelessly copies functionalities from bwa-mem2 to sort or not, do it by coordinate or queryname, and choose between samtools and picard to do so

QC

While the contributions guidelines are more extensive, please particularly ensure that:

  • [x] test.py was updated to call any added or updated example rules in a Snakefile
  • [x] input: and output: file paths in the rules can be chosen arbitrarily
  • [x] wherever possible, command line arguments are inferred and set automatically (e.g. based on file extensions in input: or output:)
  • [x] temporary files are either written to a unique hidden folder in the working directory, or (better) stored where the Python function tempfile.gettempdir() points to
  • [x] the meta.yaml contains a link to the documentation of the respective tool or command under url:
  • [x] conda environments use a minimal amount of channels and packages, in recommended ordering

Summary by CodeRabbit

Release Notes

  • New Features

    • Enhanced configurability for the Bowtie2 alignment tool with new sorting parameters.
    • Added support for additional font packages to improve visual output.
    • Introduced dynamic rule generation for various alignment configurations in the testing framework.
    • Updated dependencies for Bowtie2 and added new packages for improved functionality, including graphics and font libraries.
    • Added a new author to the Bowtie2 package and clarified functionality in the documentation.
    • New entry added to the genome index file for improved sequence tracking.
  • Bug Fixes

    • Improved error handling for missing Bowtie2 index files and refined input validation.
  • Tests

    • Expanded test coverage for Bowtie2 alignment and Sourmash tools to ensure robustness across multiple configurations.
    • Added new test functions to validate command parameters and output paths for Bowtie2 alignments.

jlanga avatar Dec 03 '24 17:12 jlanga

📝 Walkthrough

Walkthrough

This pull request introduces significant updates to the Bowtie2 alignment tool's environment configuration and functionality. Key changes include modifications to the Conda environment files (environment.linux-64.pin.txt, environment.yaml), enhancements in the meta.yaml file for parameter configurability, and the addition of dynamic rules in the Snakefile for alignment processing. Furthermore, the wrapper.py file has been updated to improve error handling and command execution. New tests have been added to test_wrappers.py to ensure comprehensive coverage of the new features and configurations.

Changes

File Path Change Summary
bio/bowtie2/align/environment.linux-64.pin.txt - Added new Conda version header.
- Removed several package URLs.
- Added multiple new packages including fonts and libraries.
- Updated existing package versions.
bio/bowtie2/align/environment.yaml - Updated bowtie2 version from 2.5.4 to 2.5.
- Added dependency: picard-slim =3.3.
- Updated snakemake-wrapper-utils from 0.6.2 to 0.6.
bio/bowtie2/align/meta.yaml - Added author: Jorge Langa.
- Introduced parameters: sort_program, sort_extra, sort_order.
- Updated notes section.
bio/bowtie2/align/test/Snakefile - Introduced dynamic rules for alignment based on sorting programs, orders, and file extensions.
bio/bowtie2/align/wrapper.py - Added error handling for missing Bowtie2 index files.
- Refined input sample validation.
- Expanded sorting program options.
- Updated command construction and logging.
test_wrappers.py - Added test function: test_bowtie2_align.
- Added test function: test_sourmash_compute.
- Minor formatting adjustments in existing tests.

Possibly related PRs

  • [#3101] Fixes handling of index files in the bwa-mem2 wrapper, which may relate to similar index management in the main PR's Conda environment updates.
  • [#3371] Updates the hisat2 align tool to include index file handling, which aligns with the main PR's focus on managing dependencies and environment configurations.
  • [#3500] Introduces a wrapper for the ngs-bits SampleSimilarity tool, which may involve similar environment management practices as seen in the main PR's updates to the Conda environment files.

Suggested reviewers

  • johanneskoester
  • fgvieira

[!TIP] CodeRabbit's docstrings feature is now available as part of our Early Access Program! Simply use the command @coderabbitai generate docstrings to have CodeRabbit automatically generate docstrings for your pull request. We would love to hear your feedback on Discord.


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

coderabbitai[bot] avatar Dec 03 '24 17:12 coderabbitai[bot]

I'll continue next week

jlanga avatar Dec 06 '24 10:12 jlanga

@fgvieira I think I am done with this wrapper.

I've left commented the part about outputting extra files from bowtie2 (metrics, unaligned, unpaired, concordant and unconcordant). Some of them introduce a lot of complexity in the code because bowtie can output 1 or 2 fastq files depending on if the input is SE or PE. These files can be easily obtained with samtools view and samtools fastq. Also if you want gzipped files you have to use different flags (--un-conc becomes --un-conc-gz or --un-conc-bz2). It is a headache.

Let me know if you want me to keep that functionality on to uncomment it.

jlanga avatar Dec 19 '24 22:12 jlanga

It seems you still have un-pushed commits. Can you push them?

fgvieira avatar Dec 20 '24 10:12 fgvieira

I don't understand. All my commits are on the bowtie2_sort branch. I have nothing else pending to push in my computer.

jlanga avatar Dec 20 '24 10:12 jlanga

I clicked the Update Branch button. I hope it was that.

jlanga avatar Dec 20 '24 10:12 jlanga

On the environment.yaml file, the patch versions are not pinned (see my comments).

fgvieira avatar Dec 20 '24 11:12 fgvieira

I see that we both removed the patch number, but in a comment later you appended the one for picard-slim:

- picard-slim =3.3.0

Is that correct? That one has to be fixed to 3.3.0?

jlanga avatar Dec 20 '24 11:12 jlanga

Ok. I see other wrappers with picard-slim =3.3.0. Fixing that and repinning everything.

jlanga avatar Dec 20 '24 11:12 jlanga

but also bowtie2 and wrapper utils

fgvieira avatar Dec 20 '24 11:12 fgvieira

Ok. Pinned everything with the exception of samtools. The pin file stays the same after that.

jlanga avatar Dec 20 '24 11:12 jlanga

Why the total re-write of the wrapper? As I see it, you only needed to add the logic for picard, like (for example) in the bwa wrapper: https://github.com/snakemake/snakemake-wrappers/blob/7b6aa5ce164a4f1c45716ae9c56dbe6d330c08f3/bio/bwa/mem/wrapper.py#L43-L66

and change the line: https://github.com/snakemake/snakemake-wrappers/blob/7b6aa5ce164a4f1c45716ae9c56dbe6d330c08f3/bio/bowtie2/align/wrapper.py#L97

fgvieira avatar Dec 20 '24 12:12 fgvieira

My initial plan was to just do either samtools view or sort, but I saw the wrapper from bwa-mem2 and I saw that it could be done for sam, bam and cram, sorted and unsorted, and with samtools and picard.

But bwa-mem2 only aligns and generates the sam file. Bowtie2 can produce extra outputs.

And now you brought to my attention fgbio SortSam.

And there are wrappers specific to a single sort program like bwa-mem2 and samblaster

What is the official view from snakemake wrappers? A monolithic wrapper htat handles multiple tool combinations, or multiple silly wrappers?

If we go through the silly wrappers, we can delete 100+ lines of code just for validation.

jlanga avatar Dec 20 '24 14:12 jlanga

If it is just different sorting programs, I'd say a single wrapper. @johanneskoester what do you think?

fgvieira avatar Jan 24 '25 20:01 fgvieira

I would view this from the user perspective. Since it is so common to sort after alignment, it is nice to have a reasonable default for that direcly built into a wrapper. But we don't need to support all combinations. Just the current state of the art (i.e. the fastest sorter) that supports all relevant output files (bam and cram).

johanneskoester avatar Feb 03 '25 14:02 johanneskoester

This PR was marked as stale because it has been open for 6 months with no activity.

github-actions[bot] avatar Sep 01 '25 01:09 github-actions[bot]