MACS icon indicating copy to clipboard operation
MACS copied to clipboard

Running MACS2 on RNA-Seq BAM files

Open gireeshkbogu opened this issue 7 years ago • 8 comments

Hi,

I am aware that MACS2 is designed for finding peaks using ChIP-Seq data. However, when I run MACS2 on RNA-Seq data using default parameters, it seems to work as I expected. Now my question is do you see any problem with this approach?

Thanks

gireeshkbogu avatar Apr 12 '17 08:04 gireeshkbogu

Just bear in mind that you are only analysing reads that map to the transcriptome. I don't think the assumptions made in the lambda background model hold true as you are only considering a non contiguous subset of the genome. I suppose the other question is what do the results mean? There is also no control. I think my rambling reply boils down to the question... why use MACS2 on RNA-seq data? :)

On 12 April 2017 at 09:52, Gireesh Bogu [email protected] wrote:

Hi,

I am aware that MACS2 is designed for finding peaks using ChIP-Seq data. However, when I run MACS2 on RNA-Seq data using default parameters, it seems to works as I expected. Now my question is do you see any problem with this approach?

Thanks

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/taoliu/MACS/issues/186, or mute the thread https://github.com/notifications/unsubscribe-auth/AJWJPLKfGQ73ABWwFU9veSZ7EgexD7OOks5rvJDegaJpZM4M7Iek .

IanCodes avatar Apr 12 '17 09:04 IanCodes

@IanCodes Thank you for the quick comment. What do you mean non contiguous set? and do you mean it is better to ignore lambda modelling while finding peaks using RNA-Seq?

#Why use MACS2 on RNA-Seq I am trying to find genomic repeats with reliable expression patterns. Initially, I did this just by counting overlapping RNA-Seq reads with genomic repeats but with this approach, most of the expressed repeats seems to fall in regions where the RNA-Seq signal is flat . I hypothesised that if a TE is expressed it should have some kind of peak pattern depending upon where it is in the genome. Here is an example that gives a big picture screenshot. In the highlighted figure the black square is the repeat elements and the green colour indicates RNA-Seq peak found by MACS2 .

gireeshkbogu avatar Apr 12 '17 09:04 gireeshkbogu

Sorry I wasn't clear. MACS2 generates a background model looking within the enriched region, 1,000K and 10,000 from the region, and across the whole genome. However, the transcriptome coverage is patchy across the genome, so I think the lambda background would be estimated incorrectly. You can turn off the modeling, using --nolambda (i think).

Now you have explained you aims then MACS2 could give you what you need to find candidates, a bit like an ATAC-seq analysis that also does not use a control. It might be worth screening any output using the ENCODE blacklist regions, if available for this genome.

On 12 April 2017 at 10:34, Gireesh Bogu [email protected] wrote:

@IanCodes https://github.com/IanCodes Thank you for the quick comment. What do you mean non contiguous set? and do you mean it is better to ignore lambda modelling while finding peaks using RNA-Seq?

#Why use MACS2 on RNA-Seq I am trying to find genomic repeats with reliable expression patterns. Initially, I did this just by counting overlapping RNA-Seq reads with genomic repeats but with this approach, most of the expressed repeats seems to fall in regions where the RNA-Seq signal is flat . I hypothesised that if a TE is expressed it should have some kind of peak pattern depending upon where it is in the genome. Here is an example that gives a big picture [image: screenshot] https://cloud.githubusercontent.com/assets/3885659/24951044/5f1d8190-1f73-11e7-9ba3-06c093e601db.png. In the highlighted figure the black square is the repeat elements and green color indicates RNA-Seq peak found by MACS2 .

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/taoliu/MACS/issues/186#issuecomment-293524986, or mute the thread https://github.com/notifications/unsubscribe-auth/AJWJPMiKI2LopyEfjIMJpCF9eEIHtNxzks5rvJq-gaJpZM4M7Iek .

IanCodes avatar Apr 12 '17 09:04 IanCodes

@IanCodes

ENCODE 'blacklist' means unmappable regions?

Thank you for emphasising the lambda modelling. I will test how this lambda modelling affects the no.of peaks and will decide whether to use it or not.

gireeshkbogu avatar Apr 12 '17 10:04 gireeshkbogu

It is more complicated than mappability, see: https://sites.google.com/site/anshulkundaje/projects/blacklists

Although reading the summary suggest they are not applicable to RNA-seq data. But then as these are transcriptome associated reads you might be OK.

On 12 April 2017 at 11:02, Gireesh Bogu [email protected] wrote:

@IanCodes https://github.com/IanCodes

ENCODE 'blacklist' means unmappable regions?

Thank you for emphasising the lambda modelling. I will test how this lambda modelling affects the no.of peaks and will decide whether to use it or not.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/taoliu/MACS/issues/186#issuecomment-293531406, or mute the thread https://github.com/notifications/unsubscribe-auth/AJWJPI7bv5FUSzROF5EYmjyC1n23SrS4ks5rvKEtgaJpZM4M7Iek .

IanCodes avatar Apr 12 '17 10:04 IanCodes

Yes, I think this black list won't affect RNA-Seq analysis. Thanks @IanCodes !

gireeshkbogu avatar Apr 12 '17 10:04 gireeshkbogu

Hi Gireesh @gireeshkbogu, I am also trying to call peaks for RNA-seq datasets using MACS2, could you please share me with the parameter settings? Really appreciated! Thank you!

Best regards, Min

genecell avatar Jan 10 '23 22:01 genecell

@gireeshkbogu @genecell Same!

Kiliankleemann avatar Jan 29 '24 13:01 Kiliankleemann