Bug: missing transcript annotations (present by StringTie)
Hi Andrey,
Thanks for your previous help -- where you found that my missed 1-exon and 2-exon genes were due to lack of polyA tails in the BAMs. I've fixed this by now including the PolyAs that IsoSeq previously trimmed out, and IsoQuant is working much better.
Now, I'm finding the IsoQuant is missing transcripts for ~1000 genes (all multi-exon) that StringTie correctly calls. These regions all look good in IGV and have nice polyA tails that mismatch to the genome, so I don't know why IsoQuant is missing them.
I've created a mini BAM file with 2 examples, and a PowerPoint file showing screen shots of these examples.
https://personal.broadinstitute.org/scalvo/for_isoquant_debugging/ex2_missing_genes/
- AcaNeff2021.gdna.fasta : reference genome
- IsoQuantMissingGene.pptx : IGV screenshots of examples
- T1.r1.missed_gene.bam : BAM file with 2 regions, each containing a missed gene
Thanks in advance for any help!
Sarah Calvo
Dear @sarahcalvo
IsoQuant is sometimes overly careful as the main goal is to maintain high precision, which sometimes causes recall drops. Thanks for the report, I will look into the data. Meanwhile you may also try sensitive mode.
Best Andrey
Thanks Andrey! I added the isoquant.log to the same directory. I also just tried --model_construction_strategy sensitive_pacbio with the same results.