IsoQuant icon indicating copy to clipboard operation
IsoQuant copied to clipboard

Multiple samples and single samples produce some transcripts of GTF somewhat differently

Open Tang-pro opened this issue 10 months ago • 7 comments

Hi @andrewprzh

Thanks in advance.

First, suppa2 can get information about alternative splicing events based on GTF files.

When I use one sample for Isoquant analysis to get GTF, a gene in it contains intron retention events, but when I use all the samples (I measured 3 periods, 2 replicates), surprisingly I found that the total GTF file I got does not have any intron retention of this gene, what is the reason for this, is it because of the inaccurate structural annotation of GTFs obtained from a single sample? Here is the specific description in SUPPA2.

https://github.com/comprna/SUPPA/issues/207

By the way, is it still necessary to do SQANTI3 again for the structural annotation file obtained with Isoquant, because when I did SQANTI3 analysis with the GTF obtained with isoquant, it was found that the structural annotations of some transcripts were different.

Tang-pro avatar Feb 17 '25 09:02 Tang-pro

Dear @Tang-pro

In general, it's quite hard to predict the outcome of the transcript discovery algorithm - it uses a lot of different cut-offs, including cut-offs relative to gene expression. Thus, when a gene gets more reads, some novel isoforms may appear to have insufficient read support. Moreover, when providing several replicas, IsoQuant reports a novel isoform only if it's confirmed by at least 2 of them. Thus, it may also happen that some of the novel isoforms were lost due to lack of support in different replicas/samples.

By the way, is it still necessary to do SQANTI3 again for the structural annotation file obtained with Isoquant, because when I did SQANTI3 analysis with the GTF obtained with isoquant, it was found that the structural annotations of some transcripts were different.

Could you send me an example where SQANTI and IsoQuant output differs?

Best Andrey

andrewprzh avatar Feb 20 '25 00:02 andrewprzh

Image

Hi @andrewprzh

The top GTF is the original reference GTF obtained from short-read RNA-seq (containing only the gene level as a reference), the second is the transcript_models.gtf obtained by Isoquant, the third is the corrected GTF obtained by SQANTI3, and the fourth contains the predicted CDS sequence.

Tang-pro avatar Feb 20 '25 02:02 Tang-pro

@Tang-pro

Yes, IsoQuant does not predict CDS, so it makes sense to use other tools.

Could you send me this part of the GTF files? Looks odd, but I guess I won't be able to say much without the reads - the only way to understand it to go deeper into every specific case and see what the algorithm actually does.

Best Andrey

andrewprzh avatar Mar 06 '25 17:03 andrewprzh

@andrewprzh

Thanks! The data involves confidentiality, so I've sent it directly to your email.

Tang-pro avatar Mar 08 '25 06:03 Tang-pro

Hi Andrey,

A question somewhat related to this. I have Pacbio FLNC reads for stem and leaf (5 bioreps each). If I provide Isoquant all FLNC samples (stem - 5, leaf - 5) as one big concatenated fastq file, are there any downsides to doing this? My objective is to construct a high-quality transcriptome. Therefore, expression levels are not critical for me to measure at this point.

Thanks Abhijit

sanyalab avatar Mar 10 '25 13:03 sanyalab

@sanyalab Hi,

If you just wants to construct a relatively complete full-length transcriptome, I think it is possible to do so.

Tang-pro avatar Mar 11 '25 01:03 Tang-pro

@sanyalab

Although the results might be very similar, I would still recommend to provide separate files. If 2 or more files are provided, IsoQuant additionally checks that every reported transcript is supported by at least 2 files, which can reduce false positives.

Best Andrey

andrewprzh avatar Mar 16 '25 21:03 andrewprzh