ATACseqQC
ATACseqQC copied to clipboard
Optimizing Time and Memory Usage for Large BAM Files
Hi,
I've been trying to use ATACseqQC for quality control of some plant BAM files (exceeding 100GB in size). However, I encountered difficulties when attempting to run it, even after allocating a large amount of memory in our HPC system. I am looking for ways to optimize both time and memory usage, especially when dealing with significantly large BAM files. I couldn't find any option, for example, for the shiftGAlignmentsList
function.
Many thanks in advance for your assistance.
I suppose you are using bigFile=TRUE
when youreadBamFile
.
You can try to split the BAM file into small one and shiftGAlignmentsList for the small bams and them merge them after running.
To split bam file, please refer samtools view
Hi, thank you for your reply. Yes, I am using bigFile=TRUE
. I applied the shiftGAlignmentsList
function to one replicate of my samples. However, I encountered issues in the subsequent steps as well. I am wondering if there is a possibility to parallelize these functions. Even with 2T memory allocation, I am getting an error: long vectors not supported yet: ../../../include/Rinlinedfuns.h:537 Execution halted.
Thanks
Could you please first try to split the bam file into smaller one such as just contain the chr1 reads by samtools view -bho chr1.bam input.bam chr1
Actually, I have the results for one chromosome, but the researcher prefers to have information for the entire genome since the results are not interpretable in isolation,
Thanks again.
Simply do one by one and them merge the splited bam files.
So, at which point should I merge them? For instance, to generate a PTscore
plot of the entire genome, I require a gal1 object of all chromosomes. However, creating a gal1 object from merged BAM files takes several days.
Thank you again for your time.