MACS
MACS copied to clipboard
Q: Different read lengths for input and IP sample, would it affect accuracy of peak calls?
For a recent ChIP-seq experiment we sequenced the input sample and IP sample with different read lengths (IP was 125 paired-end, INPUT was 50 single-end). When calling peaks using mostly default options, peaks were called successfully but MACS2 printed a warning indicating that "since the calculated d (53) from paired-peaks are smaller than 2tag length, etc.". It gave alternate 'd' to use that was longer (though still shorter than 2tag length depending on what tag length is used.
My first thought is that this would just affect the distance that nearby peaks are merged together and not necessarily the calls themselves, but any informed thoughts on whether this difference in length off reads will have a discernible effect on peak calls would be appreciated.
@tjten Since you see this warning, it means you ran MACS2 in the single-end mode. In this case, for each pair of reads, only the 1st mate (bitwise flag 64) will be kept. The model then would be built by looking for the best shift between the reads mapped to the plus strand and the reads mapped to the minus strand. d=53 is too short in most cases. You may see if other 'alternative d' works (in most ChIP-seq cases, it should be around 200 to 300bps). However, my suggestion for your data analysis is that you'd better use MACS2 in paired-end mode (-f BAMPE or -f BEDPE) since your IP is PE. In this way, we do not have to let MACS2 build the shifting model (--nomodel will be automatically turned on).
I was very excited coming across this answer, but when I tried running a both a PE and SE read file in paired-end mode (-f BAMPE) I noticed all the unpaired reads were discarded by MACS. :(
Is there an option to have MACS use a hybrid model of paired and single ends? For example use the real fragments from the paired end file but extend (and keep) reads from the single end file?