Improve fragment size handling for MACS2 peak calling
Right now, the fragment size can be a main input parameter of the nf-core/chipseq pipeline. However, if this parameter is supplied, MACS2 callpeaks is still set-up to automatically guess the fragment size. If the IP signal is not strong enough, or the coverage is not very high, the detection algorithm can easily fail to detect or guess correctly the fragment size which leads to the peak-calling step crashing.
The solution I'm suggesting is: if the user inputs the fragment size then it could be passed to MACS2 callpeaks like this:
--nomodel --extsize $params.fragment_size
Or at least, the user should have the option to deactivate automatic detection in MACS2 callpeaks and pass the input fragment size
Hi @ppericard, I intentionally avoided passing the fragment size to the peak calling because one of the main advantages of MACS2 is that it is able to calculate the shifting model itself. The fragment size is actually used for other processes such as creating the bigWig tracks and for some QC plots with deepTools where this information will not be known.
A better solution would be to let the user decide which parameters they would like to provide MACS2 in these instances, and that will be possible in the next release anyway since it will be written in DSL2.
Hi @drpatelh. I understand the logic. I'm just surprised no-one was stopped by this pb before. I have IP tracks with very clear signal but macs2 cannot identify correctly the shifting model. I have no idea why. I ended up forking the repo and modifying the command line for macs2 callpeaks so I can perform the peak-calling "manually" by passing the fragment size. I'll be watching closely for the next release since going to DSL2 will allow much more flexibility in terms of using all the tools parameters.
Yeah, sorry. Its a fine balance between providing a generic set of parameters that can be used to tune the behaviour of the pipeline, maintaining best practice and trying to factor in everyone's use cases. Defo should be easier in the next release.
Hi @drpatelh, is it possible to reopen this issue?
I have also had problems with smaller genome that was workable by specifying --nomodel and --extsize
https://github.com/macs3-project/MACS/issues/353
Now that the release is in DSL2, maybe it can be implemented? Here is what worked for me, but I haven't worked on the details, and I'm new to this so I don't know if this is optimal: https://github.com/nf-core/chipseq/compare/master...johannesnicolaus:nf-core-chipseq:develop
Hi @johannesnicolaus ! Pinging @JoseEspinosa here because he has been responsible for bringing the pipeline to DSL2. Might be easier to have a single parameter that sets non-mandatory options for MACS2 e.g. --extra_macs2_callpeak_args . Could have similar options for other commonly changed tools. Just need to sanity check that providing some of these options in particular combinations still works.
Have you tried overriding these options via a custom config file? This is one of the benefits of using the DSL2 implementation.
Thanks for the reply! Do you mean to set these options via a custom config file? I haven't tried that yet because changing the modules.config and nextflow.config was more straightforward to me. Maybe I can try that as well.
Yes, I think more people are modifying macs2 parameters using a custom config. Actually, this slack exchange might be interesting for you: https://nfcore.slack.com/archives/CFP55NZ5G/p1652280017471059?thread_ts=1652184721.335959&cid=CFP55NZ5G
About implementing the --extra_macs2_callpeak_args parameter, at a first glance, I would be more in favor of using a custom config since the validation of the extra parameters could be a little bit complicated and also would need to keep them always updated with the last pipeline installed macs2 version :thinking:
Thanks! Seems good for now
Can be tuned from 2.0.0 release