IsoQuant icon indicating copy to clipboard operation
IsoQuant copied to clipboard

trying to understand the `OUT.novel_vs_known.SQANTI-like.tsv`

Open sparthib opened this issue 1 year ago • 4 comments

In the OUT.novel_vs_known.SQANTI-like.tsv file, are all the entries for novel transcripts? If so, I'm trying to wrap my head around why some of them are categorized as FSM. Any pointers on this would be helpful, thank you!

Sowmya

sparthib avatar Jun 13 '24 19:06 sparthib

@sparthib

Yes, SQANTI-like output contains only information about novel transcripts. Could send me a example? It would also be nice to see both novel and assigned known transcripts from the GTF file as well.

Best Andrey

andrewprzh avatar Jun 14 '24 17:06 andrewprzh

Pasting a few lines from the file here.

transcript372.chr1.nnic chr1    +       2357    11      novel_not_in_catalog    ENSG00000187634.13      ENST00000341065.8       2191       12      76      1       76      0       alternative_structure_novel;correct_polya_site_right    FALSE   True    NA      NA         NA      NA      NA      NA      NA      False   NA      NA      NA      C       NA      NA      NA      NA      NA      930312     944150  NA      0.05    TCCCGTGTCTACTGCCTCCC    NA      NA      NA      NA      NA      NA      NA      NA      NA
transcript376.chr1.nnic chr1    +       2405    11      novel_not_in_catalog    ENSG00000187634.13      ENST00000342066.8       2557       14      395     0       76      0       alternative_structure_novel;terminal_site_match_right_precise;correct_polya_site_right     FALSE   True    NA      NA      NA      NA      NA      NA      NA      False   NA      NA      NA      C       NA      NA         NA      NA      NA      925942  944150  NA      0.05    TCCCGTGTCTACTGCCTCCC    NA      NA      NA      NA      NA      NA         NA      NA      NA
transcript450.chr1.nnic chr1    -       2174    11      novel_not_in_catalog    ENSG00000279457.4       ENST00000623083.4       1397       10      -493    -297    -493    -297    intron_shift;extra_intron_flanking_right;alternative_polya_site_left    FALSE   True       NA      NA      NA      NA      NA      NA      NA      False   NA      NA      NA      C       NA      NA      NA      NA         NA      -1      -1      NA      0.30    TATTAAAAGCACACTGTTGG    NA      NA      NA      NA      NA      NA      NA      NA         NA
transcript499.chr1.nic  chr1    -       6633    11      novel_in_catalog        ENSG00000131591.18      ENST00000421241.7       1832       10      30      0       28      0       alternative_structure_known;terminal_site_match_left_precise;correct_polya_site_left       FALSE   True    NA      NA      NA      NA      NA      NA      NA      True    NA      NA      NA      C       NA      NA         NA      NA      NA      1091543 1082896 NA      0.05    AGAGCAGCTCGGAACGCAGC    NA      NA      NA      NA      NA      NA         NA      NA      NA

The file has ~30k lines as opposed to the counts file which has over 200k files, so it makes sense that the novel_vs_known is only about categorizing the novel transcripts.

Follow up question: under the additional info column I see terms that I don't see when I run SQANTI manually on my bambu output. Could you explain more how this column works and what are all the possible subcategories I could observe here? Additionally, as seen above, I observe mostly NAs in the rest of the columns but I am unsure what these column correspond to, so a header or description of these columns would be beneficial. Thank you so much @andrewprzh!

sparthib avatar Jun 14 '24 17:06 sparthib

@sparthib

I agree, headers would be nice. I'll add them in the next release. I think I used information from SQANTI wiki, but it might have changed over time..

Moreover, IsoQuant does not provide the exact SQANTI-like output, thus, a lot of columns are NAs (it would take a lot of time to re-implement all features). If you'd like to have full SQANTI output, it's better to run SQANTI itself :)

under the additional info column I see terms that I don't see when I run SQANTI manually on my bambu output. Could you explain more how this column works and what are all the possible subcategories I could observe here?

Sorry, which column do you refer to?

Best Andrey

andrewprzh avatar Jun 24 '24 12:06 andrewprzh

@sparthib

I think I found the reason for the original problem with FSM records. IsoQuant outputs a few novel isoforms that are very similar to the reference ones, which should not be there. I'll make a bug-fix release soon.

andrewprzh avatar Jul 04 '24 18:07 andrewprzh

@sparthib

This issue should be now resolved starting version 3.5.0.

andrewprzh avatar Aug 27 '24 10:08 andrewprzh