mosdepth icon indicating copy to clipboard operation
mosdepth copied to clipboard

Add output file with # of alignment records used and ignored

Open af8 opened this issue 6 years ago • 5 comments

Hi Brent,

Would it be easy to output along with the coverage files, counts of alignment records parsed during the analysis ?

According to the --flag option value in the command line, we can have a small file which contains something like:

READS_USED    4567
READS_IGNORED    1234

This would allow to compute a theoretical (or intended) coverage for the experiment and to compare for instance with what is really on-target without having to parse the BAM elsewhere.

Thanks, Anthony

af8 avatar Jul 31 '18 08:07 af8

I'm not opposed to this, but if the use is coverage calculation, why not just use the values in the `*.dist.txt' files? there you can get the actual coverage.

brentp avatar Jul 31 '18 13:07 brentp

Yes sure, I am actually using extensively all files generated by Mosdepth to compute some statistics related to the observed coverage.

This would be useful to also simply report the expected coverage based on the number of reads generated in the experiment. The difference (exp - obs) will give some clues about how well the experiment went.

Basically, expected coverage would be (in a WES/target-seq setting):

R: read length
L: target size
N: number of reads
Expected mean coverage = N*R/L

Exp - Obs quickly gives an idea of how much information we lost (due to duplicate reads, off-target mapped reads, ... ). I can get these numbers from other sources but I was just wondering if these can be an easy by-product of Mosdepth tool. This would allow kind of a quick QC feature in a way. But I can understand that this might not be the philosophy of the tool.

af8 avatar Jul 31 '18 16:07 af8

I agree this would be useful. I'll think about how to expose and implement.

brentp avatar Aug 01 '18 16:08 brentp

This is not off my radar, I'm still considering what to add. I'd also like to output mean/s.d. by chromosome so that a user could get a z-score as needed.

brentp avatar Sep 05 '18 16:09 brentp

this could probably be added to the summary output...

brentp avatar Aug 24 '19 03:08 brentp