atacseq Adding peak-calling with Genrich (fix #108) and multi-mapping read analysis

Adding peak-calling with Genrich (fix #108) and multi-mapping read analysis

Open samuelruizperez opened this issue 10 months ago • 1 comments

This PR:

Addresses issue #108:

This version of the pipeline keeps the current pipeline structure the same but adds a separate "branch" for peak-calling with Genrich and its necessary up and downstream processes.
- After merging resequenced libraries, BAMs are sorted by name but not filtered. Duplicate and blacklisted region removal is left to Genrich to allow multimapping read analysis.
- Currently, normalised bigWig coverage tracks are generated for these unfiltered BAMs and added to IGV. Here perhaps it would be more appropriate to replace these coverage files with normalised versions of the pileup.bedGraph files generated by Genrich? (see Genrich/issues/71)
- By default, biological replicates are analysed collectively by Genrich, so replicate merging and consensus peak analysis is skipped. Optionally, peak calling can be performed separately for each biological replicate by setting the parameter --skip_genrich_sep to false.
- Currently, this module generates narrow peaks by default. Maybe a broad peak option could be added by modifying the peak calling parameters based on https://github.com/jsh58/Genrich/issues/58#issuecomment-642929638 ?.
- featureCounts data for Genrich's output is a work in progress.
NOTES:
- This PR uses this version of the nf-core module: https://github.com/nf-core/modules/pull/3720.
- Based on #301, it would seem that using Genrich's ATAC-seq mode for peak calling would solve:
  - #164
  - #169
  - part of #91

Adds a parameter --analyze_multimappers <Int> to include multimapping reads (secondary alignments) in Genrich's peak calling process (see Genrich#multimap). Currently, it only works with --aligner bowtie2 or --aligner star, the option with Chromap is a work in progress.

For example, setting --analyze_multimappers 50, will run Bowtie2 with -k 50 (or STAR with --outFilterMultimapNmax 50 --winAnchorMultimapNmax 100) and Genrich with -s 50. This means the aligner will report 50 distinct valid alignments for each read, and Genrich will keep secondary alignments whose scores are within 50 of the best for peak calling.

Secondary alignments will still be filtered out with samtools view -q 1 -F 4 -F 256 before calling peaks with MACS2 unless --keep_multi_map is set.

NOTES:
- I have tested --analyze_multimappers with all the test datasets without issues, but when using other experimental data I have found that the allocated resources for Bowtie2 alignment (withLabel:process_high) might not be enough. Usually, it runs out of time or memory. Perhaps we could set up a conditional label/process resource allocation when this parameter is used with Bowtie2?

PR checklist

[x] This comment contains a description of changes (with reason).
[x] If you've added a new tool - have you followed the pipeline conventions in the contribution docs
[ ] Make sure your code lints (nf-core lint).
[ ] Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
[ ] Usage Documentation in docs/usage.md is updated.
[ ] Output Documentation in docs/output.md is updated.
[ ] CHANGELOG.md is updated.
[ ] README.md is updated (including new tool citations and authors/contributors).

I would really appreciate it if you could take a look at these changes. This is my first attempt at a contribution to nf-core, so please let me know if have made any basic mistakes or if I could do something better to follow the guidelines.

Thanks!

@samuelruizperez

Aug 22 '23 15:08 samuelruizperez

atacseq atacseq copied to clipboard

Adding peak-calling with Genrich (fix #108) and multi-mapping read analysis

PR checklist

atacseq
atacseq copied to clipboard