IsoQuant icon indicating copy to clipboard operation
IsoQuant copied to clipboard

Failing with assertion error: "Gene and transcript records have unequal strands"...but they don't?

Open jamestwebber opened this issue 1 year ago • 4 comments

I'm fairly confused about what's happening here... I'm re-running IsoQuant using the extended gtf from a previous run, to try to assign more reads to the previous annotation. I see a ton of warnings that look something like:

WARNING - Gene and transcript records have unequal strands: ENSG00000180185.12: -, ENST00000382666.6: +

Mostly these warnings are for novel transcripts but there are some annotated transcripts like the above. Confusingly, when I check the GTF for these records, there is no inconsistency in the strandedness. I wonder if there is a bug here?

jamestwebber avatar Aug 02 '24 02:08 jamestwebber

Could you send me the GTF file so I can reproduce the issue?

Best Andrey

andrewprzh avatar Aug 02 '24 08:08 andrewprzh

Another strange issue: when I look at the gene entries in the extended GTF vs the original, they seem to have changed the original coordinates? I thought the extended GTF would just contain the original entries unchanged, but perhaps it is trying to refine them?

$ grep 'ENSG00000186777.12' OUT.extended_annotation.gtf | grep '\sgene\s'
chr4    HAVANA  gene    161651  374337  .       -       .       gene_id "ENSG00000186777.12"; transcripts "4"; gene_type "protein_coding"; gene_name "ZNF732"; hgnc_id "HGNC:37138"; havana_gene "OTTHUMG00000159883.3"; 
$ grep 'ENSG00000186777.12' ~/reference/GRCh38.gencode.v39.annotation.basic.gtf | grep '\sgene\s'
chr4    HAVANA  gene    270675  305474  .       -       .       gene_id "ENSG00000186777.12"; gene_type "protein_coding"; gene_name "ZNF732"; level 1; hgnc_id "HGNC:37138"; havana_gene "OTTHUMG00000159883.3";

jamestwebber avatar Aug 02 '24 16:08 jamestwebber

@jamestwebber

If IsoQuant detects novel isoforms and considers them to be a part of a known gene, then coordinates of this known gene are updated as well, since novel isoforms(s) may go beyond original gene region. My assumption is that's what happens here.

Best Andrey

andrewprzh avatar Aug 03 '24 09:08 andrewprzh

Yes, that seems to be what's happening. In one case I saw, the novel isoform didn't overlap with the existing gene exons at all--rather it spanned the entire gene with an intron. I found that a bit surprising, I would maybe expect that to be a novel gene (it might really be a chimeric molecule or something else, not sure).

jamestwebber avatar Aug 05 '24 15:08 jamestwebber

Regarding the main bug in this issue, it seems like it doesn't come up if I use --no_model_construction, so I guess it's happening after that? That's a little surprising to me but it explains how the input GTF can be fine and then we see this error later--perhaps it is modifying the strand of a feature during the model phase.

jamestwebber avatar Aug 08 '24 13:08 jamestwebber

I think I've figured it out 🎉

I was feeding in the extended_annotation.gtf from a previous run, so it contained things like novel_gene_chr1_1 and so on. When creating new novel genes/transcript, IsoQuant isn't checking for whether the name already exists, so it ends up creating two entries that can have contradictory data (like strandedness).

I can fix this for my own workflow for now, by just renaming the genes/transcripts before I run the second round of isoquant (I need to test that this works but I'm pretty confident). But in general I'd consider this is a simple bug that could get fixed in a future version.

jamestwebber avatar Aug 15 '24 19:08 jamestwebber

Can confirm that this runs successfully once I renamed the input gene/transcript IDs to something different. 👍

This did uncover another issue though: when creating the new GTF, IsoQuant will add tags that already exist (e.g. I see two exons tags for transcripts that were present in the input). In some cases it seems to be mangling the formatting while doing this, which is a problem for parsing.

jamestwebber avatar Aug 16 '24 14:08 jamestwebber

@jamestwebber

Thank you so much for the investigation! I'll re-work naming conversion and update current version. I'll keep you posted.

Best Andrey

andrewprzh avatar Aug 20 '24 14:08 andrewprzh

@jamestwebber

Thanks again for the report. Both issues should be now fixed in 3.5.1

andrewprzh avatar Aug 27 '24 10:08 andrewprzh