Augustus icon indicating copy to clipboard operation
Augustus copied to clipboard

filterBam returns empty file

Open CarlosMenFer opened this issue 3 years ago • 2 comments

I am trying to reproduce the "Predicting Genes in Single Genomes with AUGUSTUS" tutorial:

https://pubmed.ncbi.nlm.nih.gov/30466165/

When I tried to reproduce the "Filtering BAM" step using the filterBam program, I received an empty BAM file as a result.

The command line I used was: filterBam --in aligned.bam --out aligned.f.bam --uniq

Where the "aligned.bam" files is BAM file obtained with START and sorted by read name with samtools.

No error messages were returned.

CarlosMenFer avatar Nov 18 '20 12:11 CarlosMenFer

The output of the above command line was:

processed line 12400001


Processed alignments: 12409107

Summary of filtered alignments:

unmapped : 0 percent identity: 12409107 coverage : 0 not unique : 0

Cmd line: /root/augustus/bin/filterBam --in aligned.bam --out aligned.f.bam --uniq

CarlosMenFer avatar Nov 18 '20 12:11 CarlosMenFer

In your case and in the tutorial, two things come together

  • filterBam uses the NM flag in BAM files to determine sequence identity, but theSTAR genome alignment does not generate an NM tag by default. The NM tag can be written with the STAR option "--outSAMattributes", so a correct call with "NM" would be:

    STAR --genomeDir star_genome --readFilesIn rnaseq.fastq.gz \ --readFilesCommand zcat --outSAMattributes "NH" "HI" "AS" "nM" "NM"

    (where "NH" "HI" "AS" "nM" are the default attributes written by STAR).

    If the "NM" flag was not present in the BAM file, then the filter behavior was unpredictable and could have created the empty output file you observed . This bug has been fixed in the meantime (Augustus version > 3.4.0)

  • filterBam needs as input a BAM file sorted by the query name. However, the Aligned.out.ss.bam file created by samtools sort Aligned.out.s.bam > Aligned.out.ss.bam mentioned in the tutorial is sorted by coordinates. So only if Aligned.out.ss.bam is created by this sort step (and not by creating as soft link as the tutorial suggest in case of STAR output BAM files): the correct input file would be Aligned.out.s.bam ( see samtools sort -n Aligned.out.bam > Aligned.out.s.bam). The correct call of filterBam in case of performing the explicit sorting step is therefore:

    filterBam -uniq -in Aligned.out.s.bam -out Aligned.out.sf.bam

    Take care to use the correct filtered and sorted files in the next steps.

Thank you for pointing out this problem and please sorry for the inconvenience.

hmehlan avatar Sep 06 '21 08:09 hmehlan