Augustus
Augustus copied to clipboard
filterBam returns empty file
I am trying to reproduce the "Predicting Genes in Single Genomes with AUGUSTUS" tutorial:
https://pubmed.ncbi.nlm.nih.gov/30466165/
When I tried to reproduce the "Filtering BAM" step using the filterBam program, I received an empty BAM file as a result.
The command line I used was: filterBam --in aligned.bam --out aligned.f.bam --uniq
Where the "aligned.bam" files is BAM file obtained with START and sorted by read name with samtools.
No error messages were returned.
The output of the above command line was:
processed line 12400001
Processed alignments: 12409107
Summary of filtered alignments:
unmapped : 0 percent identity: 12409107 coverage : 0 not unique : 0
Cmd line: /root/augustus/bin/filterBam --in aligned.bam --out aligned.f.bam --uniq
In your case and in the tutorial, two things come together
-
filterBam uses the NM flag in BAM files to determine sequence identity, but theSTAR genome alignment does not generate an NM tag by default. The NM tag can be written with the STAR option "--outSAMattributes", so a correct call with "NM" would be:
STAR --genomeDir star_genome --readFilesIn rnaseq.fastq.gz \
--readFilesCommand zcat --outSAMattributes "NH" "HI" "AS" "nM" "NM"
(where "NH" "HI" "AS" "nM" are the default attributes written by STAR).
If the "NM" flag was not present in the BAM file, then the filter behavior was unpredictable and could have created the empty output file you observed . This bug has been fixed in the meantime (Augustus version > 3.4.0)
-
filterBam needs as input a BAM file sorted by the query name. However, the Aligned.out.ss.bam file created by
samtools sort Aligned.out.s.bam > Aligned.out.ss.bam
mentioned in the tutorial is sorted by coordinates. So only if Aligned.out.ss.bam is created by this sort step (and not by creating as soft link as the tutorial suggest in case of STAR output BAM files): the correct input file would be Aligned.out.s.bam ( seesamtools sort -n Aligned.out.bam > Aligned.out.s.bam
). The correct call of filterBam in case of performing the explicit sorting step is therefore:filterBam -uniq -in Aligned.out.s.bam -out Aligned.out.sf.bam
Take care to use the correct filtered and sorted files in the next steps.
Thank you for pointing out this problem and please sorry for the inconvenience.