MACS icon indicating copy to clipboard operation
MACS copied to clipboard

macs2 callpeak option about effective genome size and shift model

Open LunaPatton opened this issue 3 years ago • 5 comments

Hi Recently I try to run ATAC-seq data based on ENCODE pipeline, when I call peaks, I notice the default value for some certain species like human and mouse, while my species is animal (e.g., sheep), so how can I properly set this parameter? Besides, I wonder how to determine the parameters for shift model. --nomodel --shift -100 --extsize 200 or --nomodel --shift -75 --extsize 150? Does it depends on the length of paired-end reads? @taoliu Thanks !

Best wishes Zoe

LunaPatton avatar May 24 '21 02:05 LunaPatton

Hi Zoe, the parameter for the shift model won't make a noticeable difference among species. No matter if it is --nomodel --shoft -100 --extsize 200 or --nomodel --shift -75 --extsize 150, you essentially pile up the cutting sites. The only difference is the smoothing size -- 200 vs 150. You may need to tweak the -g parameter to fit the genome size of sheep, since for ATAC-seq you won't have input data, and MACS relies on 'global background' (= total signal/effective genome size) to calculate fold change or p-value.

taoliu avatar May 24 '21 18:05 taoliu

Hi Zoe, the parameter for the shift model won't make a noticeable difference among species. No matter if it is --nomodel --shoft -100 --extsize 200 or --nomodel --shift -75 --extsize 150, you essentially pile up the cutting sites. The only difference is the smoothing size -- 200 vs 150. You may need to tweak the -g parameter to fit the genome size of sheep, since for ATAC-seq you won't have input data, and MACS relies on 'global background' (= total signal/effective genome size) to calculate fold change or p-value.

Thanks for replying! Actually the genome size of sheep is ~2.8G, so '-g' parameter needs be set by myself or the software can calculate it automatically? Is there any materials for me to refer ? @taoliu

Best wishes Zoe

LunaPatton avatar May 25 '21 03:05 LunaPatton

@LunaPatton You have to put -g THE_GENOME_SIZE manually. I checked the sheep genome at UCSC genome browser site. Some suggestions: 1. you may need to get rid of a large number of 'chrUN' from your alignment files. Although it's important to use all "chromosomes/contigs" during the alignment step, biologically these unknown chromosomes won't give you much insight into gene regulation and these chrUN may cause MACS to waste a lot of memory and compute time. 2. you may need to subtract the non-'chrUN' genome size by the size of 'simple repeats' on the non-'chrUN' genome and use that result as the 'effective genome size'. Because that may represent the actual genome size that the alignment results for MACS can cover. You can use UCSC table browser to calculate that. ( or you can use other resource such as ENSEMBL)

taoliu avatar May 27 '21 12:05 taoliu

@LunaPatton You have to put -g THE_GENOME_SIZE manually. I checked the sheep genome at UCSC genome browser site. Some suggestions: 1. you may need to get rid of a large number of 'chrUN' from your alignment files. Although it's important to use all "chromosomes/contigs" during the alignment step, biologically these unknown chromosomes won't give you much insight into gene regulation and these chrUN may cause MACS to waste a lot of memory and compute time. 2. you may need to subtract the non-'chrUN' genome size by the size of 'simple repeats' on the non-'chrUN' genome and use that result as the 'effective genome size'. Because that may represent the actual genome size that the alignment results for MACS can cover. You can use UCSC table browser to calculate that. ( or you can use other resource such as ENSEMBL)

Thanks for your suggestion tao. And there are some other questions (maybe stupid):

  1. Should effective genome size be precise ? I followed your advice and tried to calculate via UCSC browser, while version of reference genome was limited (i.e., lastest one is 2015) and could not match with mine (Ensembl, 2017), so I simplely removed Ns and scaffolds to calculate. Does it work ?
  2. I noticed that several pipeline recommand to remove sex chromsome as well, does it neccessary to exclude ?
  3. Have you compared MACS2 and Genrich before? I used my data to ran both of two and turned out to be quite different.

LunaPatton avatar Sep 27 '21 03:09 LunaPatton

@taoliu

LunaPatton avatar Sep 27 '21 03:09 LunaPatton