TALON icon indicating copy to clipboard operation
TALON copied to clipboard

many novel isoforms of very short, even 1bp

Open callumparr opened this issue 1 year ago • 0 comments

I may overlooked this but I am seeing many TALON novel isoforms that seem illogical short. I was wondering first how this could come to be as the alignment themselves are longer is this like where you have fuzziness from lots of reads overlapping and TALON is trying to collapse this into an average 5'-start which is somewhere in the middle of the overlap window and same for the 3'-end.

Secondly is there something TALON does to limit this? Now most seem to be genomic or antisense but I am interested to keep these hoping to find some intronic reads.

I am going to filter out with minimum length but was wondering if I was doing something wrong.

10007   2174489 ENSG00000132170.24      TALONT002174489 PPARG   TALONT002174489 1       1       Known   Genomic None    0       0       0       0       0       1
100117  2479590 TALONG000100117 TALONT002479590 TALONG000100117 TALONT002479590 1       1       Intergenic      Genomic None    0       0       0       0       0       1
10012   1172011 ENSG00000225526.6       TALONT001172011 MKRN2OS TALONT001172011 1       1       Known   Genomic None    0       0       1       0       0       0
100139  1243802 TALONG000100139 TALONT001243802 TALONG000100139 TALONT001243802 1       1       Antisense       Genomic None    0       0       1       0       0       0
100189  703048  TALONG000100189 TALONT000703048 TALONG000100189 TALONT000703048 1       1       Antisense       Antisense       None    1       0       0       0       3       0
100208  703095  TALONG000100208 TALONT000703095 TALONG000100208 TALONT000703095 1       1       Antisense       Antisense       None    7       0       0       0       6       0
10021   1868017 ENSG00000144713.13      TALONT001868017 RPL32   TALONT001868017 1       1       Known   Genomic None    0       0       0       0       1       0
10024   1279693 ENSG00000232746.1       TALONT001279693 LINC02022       TALONT001279693 1       1       Known   Genomic None    0       0       0       1       0       0
10025   1279730 ENSG00000144711.16      TALONT001279730 IQSEC1  TALONT001279730 1       1       Known   Genomic None    0       0       3       1       0       0
10025   2217324 ENSG00000144711.16      TALONT002217324 IQSEC1  TALONT002217324 1       1       Known   Genomic None    0       0       0       0       0       1
10025   303636  ENSG00000144711.16      TALONT000303636 IQSEC1  TALONT000303636 1       1       Known   Genomic None    1       1       0       0       0       0
100286  704104  TALONG000100286 TALONT000704104 TALONG000100286 TALONT000704104 1       1       Antisense       Antisense       None    1       0       0       0       2       0
10046   1291322 ENSG00000255021.1       TALONT001291322 ENSG00000255021 TALONT001291322 1       1       Known   Genomic None    0       0       1       0       0       0
100510  704743  TALONG000100510 TALONT000704743 TALONG000100510 TALONT000704743 1       1       Antisense       Antisense       None    1       0       0       0       2       0
100512  704746  TALONG000100512 TALONT000704746 TALONG000100512 TALONT000704746 1       1       Antisense       Antisense       None    2       0       0       0       2       0
100514  704750  TALONG000100514 TALONT000704750 TALONG000100514 TALONT000704750 1       1       Antisense       Antisense       None    3       0       0       0       1       0
10053   305767  ENSG00000283392.1       TALONT000305767 ENSG00000283392 TALONT000305767 1       1       Known   Genomic None    1       0       0       0       0       0
10053   305773  ENSG00000283392.1       TALONT000305773 ENSG00000283392 TALONT000305773 1       1       Known   Genomic None    3       1       6       10      6       4
100553  2201800 TALONG000100553 TALONT002201800 TALONG000100553 TALONT002201800 1       1       Intergenic      Genomic None    0       0       0       0       0       1
100560  705022  TALONG000100560 TALONT000705022 TALONG000100560 TALONT000705022 1       1       Intergenic      Genomic None    1       0       0       0       0       0
100598  705420  TALONG000100598 TALONT000705420 TALONG000100598 TALONT000705420 1       1       Intergenic      Intergenic      None    1       0       0       0       0       1
100654  706224  TALONG000100654 TALONT000706224 TALONG000100654 TALONT000706224 1       1       Antisense       Antisense       None    2       0       0       0       0       0
100668  706279  TALONG000100668 TALONT000706279 TALONG000100668 TALONT000706279 1       1       Antisense       Antisense       None    1       0       0       0       0       0
100694  706483  TALONG000100694 TALONT000706483 TALONG000100694 TALONT000706483 1       1       Antisense       Antisense       None    1       0       0       0       2       0
100725  706687  TALONG000100725 TALONT000706687 TALONG000100725 TALONT000706687 1       1       Antisense       Antisense       None    3       0       0       0       8       0
100766  1902097 TALONG000100766 TALONT001902097 TALONG000100766 TALONT001902097 1       1       Intergenic      Genomic None    0       0       0       0       2       0
100794  706828  TALONG000100794 TALONT000706828 TALONG000100794 TALONT000706828 1       1       Antisense       Antisense       None    4       0       0       0       10      0
10083   308267  ENSG00000206560.12      TALONT000308267 ANKRD28 TALONT000308267 1       1       Known   Genomic None    1       1       6       6       1       1
100843  707002  TALONG000100843 TALONT000707002 TALONG000100843 TALONT000707002 1       1       Antisense       Antisense       None    1       0       1       5       0       0
100863  707199  TALONG000100863 TALONT000707199 TALONG000100863 TALONT000707199 1       1       Antisense       Antisense       None    1       0       0       0       0       0
100894  707439  TALONG000100894 TALONT000707439 TALONG000100894 TALONT000707439 1       1       Antisense       Antisense       None    1       0       0       0       0       0
100914  707837  TALONG000100914 TALONT000707837 TALONG000100914 TALONT000707837 1       1       Antisense       Antisense       None    3       0       1       1       0       0
100935  707977  TALONG000100935 TALONT000707977 TALONG000100935 TALONT000707977 1       1       Antisense       Antisense       None    4       0       0       0       6       0
100961  708106  TALONG000100961 TALONT000708106 TALONG000100961 TALONT000708106 1       1       Antisense       Antisense       None    2       0       0       0       0       0
100966  708159  TALONG000100966 TALONT000708159 TALONG000100966 TALONT000708159 1       1       Antisense       Antisense       None    1       2       0       2       0       2
100969  708271  TALONG000100969 TALONT000708271 TALONG000100969 TALONT000708271 1       1       Antisense       Antisense       None    3       0       0       0       0       0
10097   308755  ENSG00000131378.14      TALONT000308755 RFTN1   TALONT000308755 1       1       Known   Genomic None    1       0       0       0       1       0
100973  708334  TALONG000100973 TALONT000708334 TALONG000100973 TALONT000708334 1       1       Antisense       Antisense       None    1       0       0       0       3       0
100984  708429  TALONG000100984 TALONT000708429 TALONG000100984 TALONT000708429 1       1       Antisense       Genomic None    1       0       0       1       0       0
101036  708824  TALONG000101036 TALONT000708824 TALONG000101036 TALONT000708824 1       1       Antisense       Antisense       None    1       0       0       0       0       0
101056  708855  TALONG000101056 TALONT000708855 TALONG000101056 TALONT000708855 1       1       Antisense       Antisense       None    1       0       0       0       0       0
10108   872958  ENSG00000154822.18      TALONT000872958 PLCL2   TALONT000872958 1       1       Known   Genomic None    0       1       0       0       0       0
101100  709087  TALONG000101100 TALONT000709087 TALONG000101100 TALONT000709087 1       1       Antisense       Antisense       None    2       2       0       0       0       0
10112   1297169 ENSG00000131374.14      TALONT001297169 TBC1D5  TALONT001297169 1       1       Known   Genomic None    0       0       0       1       0       0
10122   873143  ENSG00000182568.17      TALONT000873143 SATB1   TALONT000873143 1       1       Known   Genomic None    0       8       4       3       0       0
10127   1297456 ENSG00000183960.9       TALONT001297456 KCNH8   TALONT001297456 1       1       Known   Genomic None    0       0       1       0       0       3
101282  711659  TALONG000101282 TALONT000711659 TALONG000101282 TALONT000711659 1       1       Antisense       Antisense       None    1       0       0       0       2       0

There are roughly around 3,500 novel isoforms that are below 20bp long out of some 320,000 expressed isoforms

callumparr avatar Aug 15 '22 15:08 callumparr