komplexity icon indicating copy to clipboard operation
komplexity copied to clipboard

[Question] --threshold and -k for ancient DNA

Open EisenRa opened this issue 6 years ago • 2 comments

Dear Erik,

Thanks for writing this tool!

I have a quick question regarding the use of this tool for ancient DNA data. Some of my work is on degraded DNA, which is typically log-normally distributed with a mode of ~50 bp. You mentioned that a k of 4 and threshold of 0.55 works well for 64-120 bp sequences, and I am wondering if you've tested shorter sequences (30-64 bp)?

If not, I'll have a play around with some of my data and get back to this thread.

Additionally, a feature that may be useful is the ability to provide an output file for the filtered sequences (I can make a feature request if you think it's worthwhile).

Cheers, Raphael

EisenRa avatar Dec 05 '18 04:12 EisenRa

Hi Raphael, sorry for the delayed reply. I can tell you that we got those numbers by plotting a histogram of scores from sequences derived from low-biomass samples that had both authentic (high-complexity) and likely contaminant (low-complexity) DNA, and found that the histogram was bimodal with one mode in the lower end and one mode in the higher end. We chose the threshold of 0.55 as being the midpoint between the two modes that separated the two best.

I would suggest you do the same to validate that threshold or choose a new one. I suspect that 0.55 would still be a reasonable number since we normalized for sequence length, but it's always good to check. Let me know if that makes sense.

I also like that feature idea, though my bandwidth is quite constrained these days. If you make it a separate feature request I can keep it in my queue.

Best, Erik

eclarke avatar Feb 26 '19 16:02 eclarke

Dear Erik,

No problem, thanks for responding! That makes sense. Do you have a script/markdown available to generate such a histogram -- such that I don't have to reinvent the wheel?

I'll make a feature request.

Thanks, Raphael

EisenRa avatar Mar 14 '19 03:03 EisenRa