TALON icon indicating copy to clipboard operation
TALON copied to clipboard

Optimal setting to identify alternate 3' and 5' ends

Open jchariker opened this issue 2 years ago • 3 comments

Hi, I'm interested in identifying alternate 3' and 5' ends of transcripts. What would be the best settings for this? I see the options --5p and --3p in talon_initialize_database, but I'm not quite sure what effect they have. Thanks!

jchariker avatar Apr 01 '22 21:04 jchariker

The --3p and --5p parameters tune the maximum distance that an end observed from a read can be from the annotated end for the read to still be annotated as a known transcript. If a read matches the intron chain of an annotated transcript but the ends of the read are not within the distance parameters, it will be annotated as a novel transcript. In contrast, a read with the same intron chain with ends that are within the parameters, the end assigned to the read will simply be the read from the annotation.

This is rather unfortunate as it does cloud the underlying ends of each known read and is a problem we're actively working on in the lab. The current tool that we're using to cluster and call 5'/3' ends from transcripts is lapa, which will work on either the read_annot.tsv file output from TALON, or the bam files that you input to TALON.

fairliereese avatar Apr 05 '22 00:04 fairliereese

Hi @fairliereese, can you provide more details on how to use lapa?

It seems that we should use lapa_correct_talon instead of lapa?

Thanks, Yichao

YichaoOU avatar Jun 15 '22 15:06 YichaoOU

These are questions best suited for the lapa repo. I would ask again there.

fairliereese avatar Jun 15 '22 16:06 fairliereese