MACS icon indicating copy to clipboard operation
MACS copied to clipboard

Q: calculating the genome size needed for the MACS --gsize option

Open StevenWingett opened this issue 3 years ago • 4 comments

Hi,

I had a query about calculating the genome size needed for the MACS --gsize option.

Use case I shall be using genomes not listed in your documentation and so need to calculate the genome size values myself.

To check I can do this correctly, I tried to get the same values for human and mouse as reported in your documentation (https://macs3-project.github.io/MACS/docs/callpeak.html).

Describe the problem To perform the calculation, I ran the script unique-kmers.py as described at: https://deeptools.readthedocs.io/en/develop/content/feature/effectiveGenomeSize.html

My results for human38 (assuming k-mers of 100-bp) were similar to that reported in the MACS documentation: 2.8e9 vs 2.7e9 respectively.

However, the results for mouse38 were substantially different: 2.47e9 (my calculation) vs 1.87e9 (MACS documentation).

(As might be expected, my calculations agree with those displayed on https://deeptools.readthedocs.io/en/develop/content/feature/effectiveGenomeSize.html for 100-bp kmers.)

Sorry if I am being naive or have overlooked something obvious, but I was wondering if I should be calculating the effective genomes size in a different way? Do you recommend a particular tool to perform this calculation?

Any helpful feedback would be much appreciated.

Best regards, Steven

StevenWingett avatar May 04 '22 11:05 StevenWingett

@StevenWingett In fact the default values for gsize for hs and mm are determined many years ago on much shorter reads and old assembly. I need to update the numbers now. Thanks for this information! I will update the numbers according to deeptools page https://deeptools.readthedocs.io/en/develop/content/feature/effectiveGenomeSize.html

taoliu avatar May 18 '22 15:05 taoliu

Hi @taoliu

Thanks for getting back to me. I'm glad that solves the issue.

I presume for non-standard genomes you would advise running the DeepTools unique-kmers.py script to determine the gsize value?

Thanks very much once again.

All the best, Steven

StevenWingett avatar May 19 '22 09:05 StevenWingett

@StevenWingett Yes. That should be the same numbers from deeptools using the unique-kmers approach. Basically, it should refer to the 'mappable' genome of the reads alignment process.

taoliu avatar Jun 06 '22 18:06 taoliu

Thanks

StevenWingett avatar Jun 08 '22 08:06 StevenWingett