telomerecat
telomerecat copied to clipboard
Zero length estimates
Hi, For some samples I get an estimated length of 0. What they all have in common is that F2a is negative (i.e. F4 > F2, because F2a = F2 - F4). Here's an example:
Sample,F1,F2,F4,Psi,Insert_mean,Insert_sd,Read_length,Initial_read_length,F2a,F2a_c,Length
1284_1000H.bam,549,397,711,1.255,430.0,105.505,125,125,-314,-314,0.0
Is there anything you can suggest? I'm using telomerecat version 3.2 installed from pip.
Hi thanks for opening the issue.
All of the telomere read counts look really low for this sample. Are these others like this? Has the file been preprocessed to have "duplicated" reads removed? Often telomere reads will be removed from BAM files because the repetitive nature of the read means it maps to parts of the genome erroneously.
Otherwise, I've been hearing of reports that telomerecat is poorly suited to run on samples run on NovaSeq machines. Telomerecat has not been tested on NovaSeq data yet so I can't speak for it's accuracy on those data.
Cheers!
Hi there,
I noticed that in the online documentation (https://telomerecat.readthedocs.io/en/latest/estimating_telomere.html) it says the F2a correction is default for the telbam2length command. However, for telomerecat version 3.2 you must use the -e flag for F2a correction...
@kgori: does adding the -e flag to your telbam2length improve your results?
@jhrf: I have a theory that the reports of telomerecat being poorly suited to run on NovaSeq data are likely due to people not being aware that they have to use this flag to batch correct. I think the effects of not batch correcting are just larger for NovaSeq than for HiSeqX data because the insert sizes used are much larger in the former... This is certainly true for data I am using where I have samples sequenced on both platforms.
Hello -
I am having a similar issue with a bit of an off-label application. I'm trying to apply this tool to non-human samples (scleractinian corals), and most of my datasets produce the negative F2a counts and therefore uniformly zero-length telomere estimates. (yes, corals have the canonical TTAGGG repeat sequence). The datasets I've downloaded include WGS sequences from HiSeq2500 and HiSeq4000. I don't see how the F2a correction would help, because by my understanding it shrinks the counts toward the mean, which is still negative.
My total counts are considerably higher than the original poster's, so I don't think this is a sequencing depth / stochasticity issue (especially since the trend is very consistent across all samples).
Sample F1 F2 F4 Psi Insert_mean Insert_sd Read_length Initial_read_length F2a F2a_c Length SRR7235985.bam 76995 10970 17182 1.014 356.648 67.045 150 150 -6212 -6212 0
What would the logic be for thinking the tool wouldn't work so well on NovaSeq data? I'm struggling to think of ways different error profiles or differences in things like the presence of adaptor sequences would produce this result.
Just found the discussion in the other repo. Never mind for now… https://github.com/cancerit/telomerecat/issues/35