TALON icon indicating copy to clipboard operation
TALON copied to clipboard

What is the meaning of ISM None

Open kathryncrouch opened this issue 4 months ago • 2 comments

Hi,

I see quite a few models in my TALON output where the transcript novelty assignment is ISM, and the incomplete splice match type is "None".

I understand what the Prefix, Suffix and Both subtypes mean - but under what conditions is a model assigned to ISM (as opposed to NIC/NNC) but not assigned to one of the subcategories? Is this best just thought of as "Other"?

Many of the models I see like this are partial transcripts that match only one exon in the reference. Thus, they matchpart of the model, but don't have any splice junctions to compare with the reference, but aren't considered Genomic. However, I wondered if you had a more formal definition of how ISM None arises.

kathryncrouch avatar Feb 16 '24 12:02 kathryncrouch

To add to this, I am also having some trouble understanding why some of the other models are characterisd the way they are.

In these screenshots, the darker gene models at the top are the reference. The lighter models lower down are TALON output.

image The middle model is labelled NIC. Why? The intron in this model is not represented in the reference, how is this "in catalog"?

image The third model down is labelled NIC. Again, I don't understand why. The two models above it are labelled NNC, which makes more sense.

image The lower model has a truncated exon. I would expect this to be NNC, but it's annotated NIC.

image This model is annotated NIC, but has a completely novel intron.

image The top three TALON models make sense (NNC, known, genomic, working from the top down). The two below that are more confusing. The one with the blue arrow is annotated ISM prefix and the one with the green arrow is annotated ISM suffix. I don't fully understand the logic for either of these. I feel like both should be NNC because of the completely novel introns in the 5' UTR.

Are these annotations that I don't understand something to do with introns in UTRs rather than coding regions? Or are these models actually given multiple annotations and I'm only seeing one of them (these labels are derived from the count table produced by transcript_count)? Sorry if I'm missing something obvious, but I'm really stumped by these, and I can't see the answer by looking at the definitions either in your paper or the SQUANTI paper.

kathryncrouch avatar Feb 16 '24 18:02 kathryncrouch

I believe if the transcript is not classified as an Incomplete Splice Match then it assigned as an "None" in the ISM_subtype column. An ISM is a transcript that contains a subsection of an annotated transcript but does not extend all the way to the annotated 3′ or 5′ end.

What I don't understand is what the prefix, suffix, or both mean in the context of the ISM_subtype?

I was also having trouble finding this in the paper or elsewhere.

jschroaderUAlbany avatar Apr 02 '24 02:04 jschroaderUAlbany