ATACseqQC icon indicating copy to clipboard operation
ATACseqQC copied to clipboard

Optimizing Time and Memory Usage for Large BAM Files

Open FarzanehRah opened this issue 1 year ago • 6 comments

Hi, I've been trying to use ATACseqQC for quality control of some plant BAM files (exceeding 100GB in size). However, I encountered difficulties when attempting to run it, even after allocating a large amount of memory in our HPC system. I am looking for ways to optimize both time and memory usage, especially when dealing with significantly large BAM files. I couldn't find any option, for example, for the shiftGAlignmentsList function.

Many thanks in advance for your assistance.

FarzanehRah avatar Oct 31 '23 15:10 FarzanehRah

I suppose you are using bigFile=TRUE when youreadBamFile. You can try to split the BAM file into small one and shiftGAlignmentsList for the small bams and them merge them after running. To split bam file, please refer samtools view

jianhong avatar Nov 02 '23 17:11 jianhong

Hi, thank you for your reply. Yes, I am using bigFile=TRUE. I applied the shiftGAlignmentsList function to one replicate of my samples. However, I encountered issues in the subsequent steps as well. I am wondering if there is a possibility to parallelize these functions. Even with 2T memory allocation, I am getting an error: long vectors not supported yet: ../../../include/Rinlinedfuns.h:537 Execution halted. Thanks

FarzanehRah avatar Nov 06 '23 13:11 FarzanehRah

Could you please first try to split the bam file into smaller one such as just contain the chr1 reads by samtools view -bho chr1.bam input.bam chr1

jianhong avatar Nov 06 '23 19:11 jianhong

Actually, I have the results for one chromosome, but the researcher prefers to have information for the entire genome since the results are not interpretable in isolation, Thanks again. test_PT_score

FarzanehRah avatar Nov 06 '23 21:11 FarzanehRah

Simply do one by one and them merge the splited bam files.

jianhong avatar Nov 06 '23 21:11 jianhong

So, at which point should I merge them? For instance, to generate a PTscore plot of the entire genome, I require a gal1 object of all chromosomes. However, creating a gal1 object from merged BAM files takes several days. Thank you again for your time.

FarzanehRah avatar Nov 07 '23 00:11 FarzanehRah