TALON icon indicating copy to clipboard operation
TALON copied to clipboard

How to run downstream differential analysis with an expanded transcriptome generated by TALON.

Open callumparr opened this issue 1 year ago • 0 comments

I was wondering what was your general workflow for working with an expanded annotation from TALON for general differential gene and transcript expression analysis.

After generating an expanded annotation working with human gencode v39 annotation I have the raw abundances of the reads that are aligned during the TALON database generation. I saw there is a way to directly import results from TALON into IsoformSwitchR but for the reasons such as 1) some reads are excluded due to the thresholds for identity and coverage, and 2) we filter out isoforms (and with it the support reads) with low read support and reproducibility, I thought it would make sense to go back to the original raw FASTQ input data and then do an alignment to the newly expanded transcriptome making use of all the reads. Then use this for salmon, and import salmon results into IsomformAnalyzR.

Used talon_create_GTF to get an annotation from the database and then put back in any undetected annotations from the original gencode v39. using gffread I created a transcriptome file and repeated the mapping.

For some reason where I was getting only perhaps ~1% of reads unmapped with minimap2 to original gencode.v39 for some reason, I am now getting around ~10 of reads unmapped. In my libraries, I am seeing a lot of snRNA expressions as we are polyadenylating the total RNA and then using cap-trapper to get full-length capped RNAs and these reads tend to be the reads that can no longer be mapped.

If anything I expected the no. of unmapped reads to decrease because we added more loci to the transcriptome assembly so more reads should be 'rescued' (does this sound reasonable) but it seems to have the opposite effect. I was wondering if you see similar things if you compare original transcriptome alignments to the input transcriptome and the output transcriptome from TALON.

We were advised by lh3 to increase the float for -f flag when running minimap2 map-ont and this does in fact reduce the number of unmapped reads when aligning to the new transcriptome. Not sure why though.

callumparr avatar Aug 31 '22 12:08 callumparr