mosdepth icon indicating copy to clipboard operation
mosdepth copied to clipboard

Add in min read length threshold?

Open megancamilla opened this issue 3 years ago • 3 comments

Hi mosdepth peeps,

Just wondering if it would be possible to add in a feature from samtools depth to this tool.

Specifically, something like samtools depth -l 5000 -l <int> read length threshold (ignore reads shorter than <int>) [0]

We are working with Nanopore reads and are wanting to look at coverage using only reads that are >10kb, which is what we used for our de novo assembly. (Yes we could trim our raw fast files but this is a pretty time intensive step with the current tools).

Cheers and thanks! Megan

megancamilla avatar Jul 08 '20 04:07 megancamilla

This is pretty easy to implement. Would like to get others to weigh in on how broadly useful this is. Is this something that's commonly needed for long-read data?

cc @wdecoster

brentp avatar Jul 08 '20 15:07 brentp

It's not something that I personally have needed already - but I can see some value in it. I can imagine that you want to know the coverage obtained from the "informative/useful" alignments, discarding the reads that were not optimal for assembly or SV calling.

wdecoster avatar Jul 08 '20 19:07 wdecoster

Yes, we assemble de novo primarily with Canu, which has a minimum read cut-off, so it just discards reads shorter than X before going into the k-mer counts and overlaps to generate error corrected reads. I was aligning Canu "trimmed" or "corrected" reads back to my genome but it has been suggested (I forget where I read/heard this) that we really should use the raw data to do this, not the Canu modified reads.

We are interested in exactly what @wdecoster mentions, a SV that seems to be moving around between chromosomes. And we are trying to assess if this is a gross mis-assembly or something real by looking at the genome coverage around this region.

Thanks very much for considering!

megancamilla avatar Jul 08 '20 23:07 megancamilla

@brentp Related to this, I'm working on cell-free DNA, where tumor-derived DNA fragments tend to be shorter than the blood-related fragments. I would like to test if measuring the coverage separately for <150bp and >150bp would give me better results. I could subset my bam files prior to using mosdepth, but it would be great if I could simply set fragment length filters in the mosdepth call. Would this be possible?

LudvigOlsen avatar Nov 01 '22 14:11 LudvigOlsen

This is implemented in latest release

brentp avatar Nov 23 '23 12:11 brentp