MACS
                                
                                
                                
                                    MACS copied to clipboard
                            
                            
                            
                        Q: calculating the genome size needed for the MACS --gsize option
Hi,
I had a query about calculating the genome size needed for the MACS --gsize option.
Use case I shall be using genomes not listed in your documentation and so need to calculate the genome size values myself.
To check I can do this correctly, I tried to get the same values for human and mouse as reported in your documentation (https://macs3-project.github.io/MACS/docs/callpeak.html).
Describe the problem To perform the calculation, I ran the script unique-kmers.py as described at: https://deeptools.readthedocs.io/en/develop/content/feature/effectiveGenomeSize.html
My results for human38 (assuming k-mers of 100-bp) were similar to that reported in the MACS documentation: 2.8e9 vs 2.7e9 respectively.
However, the results for mouse38 were substantially different: 2.47e9 (my calculation) vs 1.87e9 (MACS documentation).
(As might be expected, my calculations agree with those displayed on https://deeptools.readthedocs.io/en/develop/content/feature/effectiveGenomeSize.html for 100-bp kmers.)
Sorry if I am being naive or have overlooked something obvious, but I was wondering if I should be calculating the effective genomes size in a different way? Do you recommend a particular tool to perform this calculation?
Any helpful feedback would be much appreciated.
Best regards, Steven
@StevenWingett In fact the default values for gsize for hs and mm are determined many years ago on much shorter reads and old assembly. I need to update the numbers now. Thanks for this information! I will update the numbers according to deeptools page https://deeptools.readthedocs.io/en/develop/content/feature/effectiveGenomeSize.html
Hi @taoliu
Thanks for getting back to me. I'm glad that solves the issue.
I presume for non-standard genomes you would advise running the DeepTools unique-kmers.py script to determine the gsize value?
Thanks very much once again.
All the best, Steven
@StevenWingett Yes. That should be the same numbers from deeptools using the unique-kmers approach. Basically, it should refer to the 'mappable' genome of the reads alignment process.
Thanks