MACS
MACS copied to clipboard
Running MACS2 on RNA-Seq BAM files
Hi,
I am aware that MACS2 is designed for finding peaks using ChIP-Seq data. However, when I run MACS2 on RNA-Seq data using default parameters, it seems to work as I expected. Now my question is do you see any problem with this approach?
Thanks
Just bear in mind that you are only analysing reads that map to the transcriptome. I don't think the assumptions made in the lambda background model hold true as you are only considering a non contiguous subset of the genome. I suppose the other question is what do the results mean? There is also no control. I think my rambling reply boils down to the question... why use MACS2 on RNA-seq data? :)
On 12 April 2017 at 09:52, Gireesh Bogu [email protected] wrote:
Hi,
I am aware that MACS2 is designed for finding peaks using ChIP-Seq data. However, when I run MACS2 on RNA-Seq data using default parameters, it seems to works as I expected. Now my question is do you see any problem with this approach?
Thanks
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/taoliu/MACS/issues/186, or mute the thread https://github.com/notifications/unsubscribe-auth/AJWJPLKfGQ73ABWwFU9veSZ7EgexD7OOks5rvJDegaJpZM4M7Iek .
@IanCodes Thank you for the quick comment. What do you mean non contiguous set? and do you mean it is better to ignore lambda modelling while finding peaks using RNA-Seq?
#Why use MACS2 on RNA-Seq
I am trying to find genomic repeats with reliable expression patterns. Initially, I did this just by counting overlapping RNA-Seq reads with genomic repeats but with this approach, most of the expressed repeats seems to fall in regions where the RNA-Seq signal is flat . I hypothesised that if a TE is expressed it should have some kind of peak pattern depending upon where it is in the genome. Here is an example that gives a big picture
. In the highlighted figure the black square is the repeat elements and the green colour indicates RNA-Seq peak found by MACS2 .
Sorry I wasn't clear. MACS2 generates a background model looking within the enriched region, 1,000K and 10,000 from the region, and across the whole genome. However, the transcriptome coverage is patchy across the genome, so I think the lambda background would be estimated incorrectly. You can turn off the modeling, using --nolambda (i think).
Now you have explained you aims then MACS2 could give you what you need to find candidates, a bit like an ATAC-seq analysis that also does not use a control. It might be worth screening any output using the ENCODE blacklist regions, if available for this genome.
On 12 April 2017 at 10:34, Gireesh Bogu [email protected] wrote:
@IanCodes https://github.com/IanCodes Thank you for the quick comment. What do you mean non contiguous set? and do you mean it is better to ignore lambda modelling while finding peaks using RNA-Seq?
#Why use MACS2 on RNA-Seq I am trying to find genomic repeats with reliable expression patterns. Initially, I did this just by counting overlapping RNA-Seq reads with genomic repeats but with this approach, most of the expressed repeats seems to fall in regions where the RNA-Seq signal is flat . I hypothesised that if a TE is expressed it should have some kind of peak pattern depending upon where it is in the genome. Here is an example that gives a big picture [image: screenshot] https://cloud.githubusercontent.com/assets/3885659/24951044/5f1d8190-1f73-11e7-9ba3-06c093e601db.png. In the highlighted figure the black square is the repeat elements and green color indicates RNA-Seq peak found by MACS2 .
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/taoliu/MACS/issues/186#issuecomment-293524986, or mute the thread https://github.com/notifications/unsubscribe-auth/AJWJPMiKI2LopyEfjIMJpCF9eEIHtNxzks5rvJq-gaJpZM4M7Iek .
@IanCodes
ENCODE 'blacklist' means unmappable regions?
Thank you for emphasising the lambda modelling. I will test how this lambda modelling affects the no.of peaks and will decide whether to use it or not.
It is more complicated than mappability, see: https://sites.google.com/site/anshulkundaje/projects/blacklists
Although reading the summary suggest they are not applicable to RNA-seq data. But then as these are transcriptome associated reads you might be OK.
On 12 April 2017 at 11:02, Gireesh Bogu [email protected] wrote:
@IanCodes https://github.com/IanCodes
ENCODE 'blacklist' means unmappable regions?
Thank you for emphasising the lambda modelling. I will test how this lambda modelling affects the no.of peaks and will decide whether to use it or not.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/taoliu/MACS/issues/186#issuecomment-293531406, or mute the thread https://github.com/notifications/unsubscribe-auth/AJWJPI7bv5FUSzROF5EYmjyC1n23SrS4ks5rvKEtgaJpZM4M7Iek .
Yes, I think this black list won't affect RNA-Seq analysis. Thanks @IanCodes !
Hi Gireesh @gireeshkbogu, I am also trying to call peaks for RNA-seq datasets using MACS2, could you please share me with the parameter settings? Really appreciated! Thank you!
Best regards, Min
@gireeshkbogu @genecell Same!