chipseq icon indicating copy to clipboard operation
chipseq copied to clipboard

Improve fragment size handling for MACS2 peak calling

Open ppericard opened this issue 5 years ago • 8 comments

Right now, the fragment size can be a main input parameter of the nf-core/chipseq pipeline. However, if this parameter is supplied, MACS2 callpeaks is still set-up to automatically guess the fragment size. If the IP signal is not strong enough, or the coverage is not very high, the detection algorithm can easily fail to detect or guess correctly the fragment size which leads to the peak-calling step crashing.

The solution I'm suggesting is: if the user inputs the fragment size then it could be passed to MACS2 callpeaks like this:

--nomodel --extsize $params.fragment_size

Or at least, the user should have the option to deactivate automatic detection in MACS2 callpeaks and pass the input fragment size

ppericard avatar Nov 02 '20 10:11 ppericard

Hi @ppericard, I intentionally avoided passing the fragment size to the peak calling because one of the main advantages of MACS2 is that it is able to calculate the shifting model itself. The fragment size is actually used for other processes such as creating the bigWig tracks and for some QC plots with deepTools where this information will not be known.

A better solution would be to let the user decide which parameters they would like to provide MACS2 in these instances, and that will be possible in the next release anyway since it will be written in DSL2.

drpatelh avatar Nov 02 '20 12:11 drpatelh

Hi @drpatelh. I understand the logic. I'm just surprised no-one was stopped by this pb before. I have IP tracks with very clear signal but macs2 cannot identify correctly the shifting model. I have no idea why. I ended up forking the repo and modifying the command line for macs2 callpeaks so I can perform the peak-calling "manually" by passing the fragment size. I'll be watching closely for the next release since going to DSL2 will allow much more flexibility in terms of using all the tools parameters.

ppericard avatar Nov 02 '20 12:11 ppericard

Yeah, sorry. Its a fine balance between providing a generic set of parameters that can be used to tune the behaviour of the pipeline, maintaining best practice and trying to factor in everyone's use cases. Defo should be easier in the next release.

drpatelh avatar Nov 02 '20 12:11 drpatelh

Hi @drpatelh, is it possible to reopen this issue?

I have also had problems with smaller genome that was workable by specifying --nomodel and --extsize

https://github.com/macs3-project/MACS/issues/353

Now that the release is in DSL2, maybe it can be implemented? Here is what worked for me, but I haven't worked on the details, and I'm new to this so I don't know if this is optimal: https://github.com/nf-core/chipseq/compare/master...johannesnicolaus:nf-core-chipseq:develop

johannesnicolaus avatar Apr 20 '23 00:04 johannesnicolaus

Hi @johannesnicolaus ! Pinging @JoseEspinosa here because he has been responsible for bringing the pipeline to DSL2. Might be easier to have a single parameter that sets non-mandatory options for MACS2 e.g. --extra_macs2_callpeak_args . Could have similar options for other commonly changed tools. Just need to sanity check that providing some of these options in particular combinations still works.

Have you tried overriding these options via a custom config file? This is one of the benefits of using the DSL2 implementation.

drpatelh avatar Apr 20 '23 07:04 drpatelh

Thanks for the reply! Do you mean to set these options via a custom config file? I haven't tried that yet because changing the modules.config and nextflow.config was more straightforward to me. Maybe I can try that as well.

johannesnicolaus avatar Apr 20 '23 07:04 johannesnicolaus

Yes, I think more people are modifying macs2 parameters using a custom config. Actually, this slack exchange might be interesting for you: https://nfcore.slack.com/archives/CFP55NZ5G/p1652280017471059?thread_ts=1652184721.335959&cid=CFP55NZ5G

About implementing the --extra_macs2_callpeak_args parameter, at a first glance, I would be more in favor of using a custom config since the validation of the extra parameters could be a little bit complicated and also would need to keep them always updated with the last pipeline installed macs2 version :thinking:

JoseEspinosa avatar Apr 20 '23 07:04 JoseEspinosa

Thanks! Seems good for now

johannesnicolaus avatar Apr 20 '23 12:04 johannesnicolaus

Can be tuned from 2.0.0 release

JoseEspinosa avatar Jul 24 '24 10:07 JoseEspinosa