quast
quast copied to clipboard
QUAST storing sam, bam, sorted.bam
Hello, I was wondering why does QUAST store at the same time the sam, bam and sorted.bam? It takes a huge lot of disk space. I tried the option --space-efficient but it still writes to the disk a sam, then a bam and then a sorted.bam. So basically the alignment is written 3 times to the disk.
Here is my command
./quast-5.0.2/quast.py --eukaryote --large --circos --pe1 $R1 --pe2 $R2 --pacbio ../allPB.fa --nanopore ../allONTvaga.fa --threads 24 -o quast_report shasta_final.fa --space-efficient
thank you
EDIT, is it because --space-efficient is wrongly placed as an argument? If so sorry ><
Actually it's not a question of the argument wrongly placed. I also notice it seems to use only half the number of the specified thread count.
I vote in support of this issue. The temporary storage required when analyzing raw reads appears excessive due to redundancy and may lead to most of the "No space left on device" errors. One example I ran into: I have a 12 Mbase genome and an assembly of the same size I would like to evaluate.
- 3GB of nanopore reads (fastq.gz)
- 16GB of illumina reads (fastq.gz)
- The whole analysis directory < 100GB including several processed data and multiple assemblies.
The process maxed out at 500GB in the (when the disk ran full) quast temporary folder which contained:
- copies of the input reads (fastq unzipped)
- .sam+bam files of all the alignments + the sorted .sam files
I think this problem could be addressed with relative ease by deleting intermediate files (e.g. deleting sam files once bam files have been created) or using samtools via a unix pipe. From what I understand from the documentation --space-efficient refers to RAM requirements, not disk.