TALON icon indicating copy to clipboard operation
TALON copied to clipboard

Many reads fail QC during TALON run but meet primary, coverage, and identity filters

Open callumparr opened this issue 2 years ago • 4 comments

Using TALON v5 installed python setup.py install on HPC running Debian

Using python version 3.6.7

talon --f /analysisdata/fantom6/Interactome/ONT-CAGE_TALON_Callum/F6_interactome_config_run2.csv --db /analysisdata/fantom6/Interactome/ONT-CAGE_TALON_Callum/F6_interactome.db --build hg38 --threads 12 --o /analysisdata/fantom6/Interactome/ONT-CAGE_TALON_Callum/F6_interactome_run2

I kept the default 0.9 fraction alignment and 0.8 identity defaults

I was routing through the TALON QC log file because we are seeing many reads filtered out despite using cap-trap and oligo-dT alignment so sure we have good quality data. I actually found a potential issue that may account for a lot of reads having low fraction alignment due to my library prep and pychopper not trimming effectively the polyA tails from the FASTQ reads but then I saw an additional subset of alignments that were filtered out not because they were not primary alignments, nor failed either of the fraction aligned or identity filters.

I attach an upSet plot of the reasoning for an alignment passed to TALON to either pass or fail the QC step. You can see the third column has no reason to fail around 3.5M reads.

I was looking through the TALON_label log and I roughly saw around 0.5M reads with evidence of internal priming but from what I understand this doesn't factor for generating the talon database.

Is there some other behind the scenes filtering going on during database generation that isn't reported in the QC log?

iPSC_rep1_run1_UpSetR

callumparr avatar Apr 15 '22 10:04 callumparr

Your intuition that internal priming / the reproducibility filter should not be affecting these numbers is correct.

I'm looking into it otherwise. I've checked a log file that I have lying around and have found something similar :/ It does not seem to me that this should be happening. I will update you when I have found anything.

fairliereese avatar Apr 20 '22 23:04 fairliereese

Your intuition that internal priming / the reproducibility filter should not be affecting these numbers is correct.

I'm looking into it otherwise. I've checked a log file that I have lying around and have found something similar :/ It does not seem to me that this should be happening. I will update you when I have found anything.

Thank you for the reply and for looking into it. When I have the time I will look into this type of read failing and read characteristics.

callumparr avatar Apr 21 '22 03:04 callumparr

If you're also planning to look into it on your end, here's some code that might be useful as a starting point: https://github.com/fairliereese/220421_talon_debug/blob/master/check_talon_log.ipynb

fairliereese avatar Apr 21 '22 16:04 fairliereese

I looked into it a bit more and I am still at a loss why some reads are failing. This was consistent across multiple samples although all processed the same so there is the possibility I am doing something weird.

callumparr avatar May 19 '22 08:05 callumparr