MACS MACS2 stalls pre-computing pvalue-qvalue table..

Hello,

MACS2 stalls at the following line:

#3 Pre-compute pvalue-qvalue table...

I have two BAM files, one converted to BAM from Bowtie2 output and the other which has underwent some processing. Here is an example alignment from the raw BAM file:

HISEQ:127:C7HJ7ANXX:1:1312:20025:49343  163 chr1    10069   30  50M =   10075   56  ACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCAACCCTAACC  CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG  MD:Z:50 PG:Z:MarkDuplicates XG:i:0  NM:i:0  XM:i:0  XN:i:0  XO:i:0  AS:i:0  XS:i:0  YS:i:0  YT:Z:CP

and here is an example from the processed BAM file:

HISEQ:127:C7HJ7ANXX:1:1312:20025:49343/1    16  chr1    10070   255 50M *   0   0   *   *

MACS2 stalls when I input the processed BAM file but not the raw file. The processed BAM file was created by converting the raw BAM to a BED file, processing the data with AWK, and then converting back to a BAM file (both conversions done using BEDtools).

I call peaks using the following command:

macs2 callpeak -t sample.bam -f BAMPE -g hs -s 50 -n sample -q 0.01 --nomodel --nolambda --keep-dup all --call-summits

Thank you

Nov 06 '15 15:11 jma1991

Is it related to #86?

Dec 22 '15 21:12 taoliu

I confirm there is still a problem in current master (9ddacb0). Using the following SAM input file:

FCC5W8BACXX:1:1101:16503:36846#ATCCAAGC/2       16      chr2L   22700622        44      100M    *       0       0       CGAGATCAACGCACAACTCGAGGTCGAAATCGAATTCGGCAAAAGACGACTTCGCATAAACATCCTAATACTACCAGGGGTGGAATTTCCTCACCCCGAA    ^b`_YJTG[ab^YSGS]a^Ya^^[X]_]]ca_\V^cdcdeedddef_\Zeabgggcfbgece_aaehgcaaaYdffaghgbfde_^cgeebWO`ccc__^    AS:i:200        XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:100        YT:Z:UU

and the following command line:

macs2 callpeak --outdir delme --name macs2 --format SAM --nomodel --treatment input.sam

macs2 get stalled at step "Pre-compute pvalue-qvalue table".

Mar 01 '17 14:03 pveber

I found this thread when troubleshooting run times for jobs that I had running, where MACS2 was part of an analysis pipeline to process ChIPseq data. I looked at the error files as the jobs had been running for several days, which was odd. Tail of the error files showed the jobs were hung at the same step in MACS2 reported here - issue #101 and #86.

INFO @ Fri, 29 Jul 2022 10:03:32: #3 Call peaks... INFO @ Fri, 29 Jul 2022 10:03:32: #3 Pre-compute pvalue-qvalue table...

I've tried using macs2 --version 2.2.6 and 2.2.7.1 using a HPC grid, and tried using 1,4, and 8 slots, expanding buffer, requested nodes with > 10GB RAM

The odd part is that when I ran MACS2 with all default parameters the jobs finished in a few hours. macs2 callpeak -f BAM -t /pathtomyfile/hisat2.sorted.bam -c /pathtomyfile/control_hisat2.sorted.bam -g 1e-9 -n sample1

However, when I add the the following parameters the program is hung at the indicated steps: --nomodel --extsize 147

or

--nomodel --extsize 147 --broad --broad-cutoff 0.1

Single threads being used seem to be computing efficiently at 100%, but the jobs simply spin for up to 7 days. These run times seem way too long, as file read in for macs2 is super fast - 8.5G BAM files load and are read in (steps 1-3) ~ 10 minutes.

So, this seems like a bug rather than a compute environment issues. Do you have suggestions on potential fixes?

One other note - I recently used the same versions of MACS2 on the same compute system on ATACseq data and used similar --no-model parameters, but only had treatment files, no control input files. These jobs used even larger files and finished within 12-24 hours.

Aug 09 '22 18:08 jcohn42

A follow up regarding this issue - I found that I get different results with different files. I tested pre-processed BED files from SRA - also ChIPseq data, and macs2 or macs3 completed peak calling in about 30 minutes, no issues. However, I downloaded the raw data from SRA and processed with my pipeline and I get the same issue - the process hangs at Pre-compute pvalue-quvalue table.. My pipeline uses trimmomatic to trim illimina adaptors, hisat2 alignment (-p 8 -k 1 --no-spliced-alignment), bamsormadup to remove duplicates and sort bam output. macs2 doesn't throw and error, but just hangs. Any ideas would be much appreciated.

Aug 17 '22 12:08 jcohn42

@jcohn42 Is it possible that the raw data processed from SRA contains many random chromosomes? The process hangs at p-q table calculation is an old issue I dealt with years ago where when the hash table (we used khash) becomes too big. Please let me know how it goes with macs3 since we are working on it. Another fix/idea I got is that maybe we should provide an option to skip converting p-value to q-value entirely to save time because people right now have better ways to control the false positive rate such as IDR on ranked peaks based on p-value or foldchange.

Aug 17 '22 17:08 taoliu

@taoliu, yes that is right, the SRA data does have many chromosomes (10 maize chromosomes and a few dozed contigs). They are present in the pre-processed BED files downloaded from SRA, I used the same reference genome for aligning the raw data using hisat2, so the bam files resulting have the same chromosomes. The only major difference I noted is that the external BED files seem to have been pre-filtered for alignment quality; $cut -5 |sort -u gives single value of 255. hisat2 uses different quality measures, but I could try filtering on 60. I don't think macs2 uses the quality information, though, is that right? I've tried macs2 and macs3 - both get hung on the same step. I agree, q values are great to have, but one can filter on p values, or use IDR or combine replicates with MSPC, etc. Since my original data was PE I also tried -f BAMPE and -f BAM on the deduplicated BAM files, but both get stuck at the same place. I could try to filter input files for only 'real' chromosome data, but I've already tried filtering out a single chromosome of data for troubleshooting, and I have the same issue.

Aug 17 '22 20:08 jcohn42

@jcohn42 Could you share me the SRA accession number so that I can give it a try?

Aug 17 '22 21:08 taoliu

@taoliu The files come from https://www.ncbi.nlm.nih.gov/bioproject/PRJNA492464 GSM3397995 Chip B73 leaf H3K56ac rep1 SRR7889772 GSM3398001 Chip B73 leaf input rep1 SRR7889778

Here is my macs3 call with the processed BED files that worked beautifully, completing within 30 minutes macs3 callpeak -f BED -t /scratch-large/5-biannual/COH3780_chip/TEST_ext/GSM3397995_mapped_chip_B73_leaf_H3K56ac_rep1.bed -c /scratch-large/5- biannual/COH3780_chip/TEST_ext/GSM3398001_mapped_chip_B73_leaf_input_rep1.bed -g 2100000000 -n TESText --outdir /scratch-large/5-biannual/COH3780_chip/TEST_ext --nomodel --extsize 147

FASTQ files were aligned to B73_v4 genome, an internal copy, but should be very similar to B73 RefGen_v4, AGPv4, available from here: https://www.maizegdb.org/assembly. Note, there is a newer genome, but this was the genome used to generate the processed files tested.

I used the same macs3 call, with the change "-f BAM" and -t <deduplicated, sorted bam files for test> -c <deduplicated, sorted bam file for input". No errors, but still paused on #3 Pre-compute pvalue-qvalue table..., running now ~ 20 hours.

Do you have recommendations for buffer size, # slots, node parameters, etc? I've requested > 10G RAM, which should be fine, I think?

Aug 17 '22 22:08 jcohn42

@taoliu, can you clarify if the problem you described was 'fixed' "The process hangs at p-q table calculation is an old issue I dealt with years ago where when the hash table (we used khash) becomes too big." What was the fix, is there some work around, like adjusting input file sizes, processing one chromosome at a time, etc, that I could use to get MACS2 (or MACS3) working for my data? Are there particular filters of aligned reads that should be used?

Thanks for your recommendations and help.

Aug 23 '22 21:08 jcohn42

@taoliu I discovered the problem - it was an issue with my genome size input - a typo! Thanks again for your replies; btw, MACS3 is also working quite well with my data and the external data. Thanks for providing a very useful tool!

Aug 26 '22 12:08 jcohn42

MACS MACS copied to clipboard

MACS2 stalls pre-computing pvalue-qvalue table..

MACS
MACS copied to clipboard