pypiper icon indicating copy to clipboard operation
pypiper copied to clipboard

Hard coded total chromosome sizes

Open nfortelny opened this issue 9 years ago • 2 comments

Total chromosome sizes are hardcoded in the function "macs2CallPeaksATACSeq" and "macs2CallPeaks" of ngstk.py. So I ran into problems when I did the analysis with mm9. Maybe this could be added to the atacseq.yaml

Also, I wonder if those genome sizes are correct: For mm9, I summed up the chromosome size values from the chromosome_sizes files: /data/prod/ngs_resources/genomes/mm9/mm9_chromlength.txt The size i get exactly corresponds to this one: http://genomewiki.ucsc.edu/index.php/Genome_size_statistics

However, if I do the same for the other genomes (e.g. hg19) I do get 3.1e9 bases, which is similar to the link above but different from what's defined in ngstk.py.

nfortelny avatar Sep 28 '16 09:09 nfortelny

Those numbers are taken straight from here: https://github.com/taoliu/MACS I guess one could be more accurate, but I wouldn't think it is so critical.

afrendeiro avatar Nov 04 '16 10:11 afrendeiro

@afrendeiro what do you think about changing these to use refgenieconf? All we would need is a chrom_sizes asset, and then you would just use `refgenieconf.get_asset(genome, "chrom_sizes") to get the chromsizes file.

that way it works with any genome.

nsheff avatar Jun 18 '19 21:06 nsheff