mosdepth
mosdepth copied to clipboard
Add output file with # of alignment records used and ignored
Hi Brent,
Would it be easy to output along with the coverage files, counts of alignment records parsed during the analysis ?
According to the --flag
option value in the command line, we can have a small file which contains something like:
READS_USED 4567
READS_IGNORED 1234
This would allow to compute a theoretical (or intended) coverage for the experiment and to compare for instance with what is really on-target without having to parse the BAM elsewhere.
Thanks, Anthony
I'm not opposed to this, but if the use is coverage calculation, why not just use the values in the `*.dist.txt' files? there you can get the actual coverage.
Yes sure, I am actually using extensively all files generated by Mosdepth to compute some statistics related to the observed coverage.
This would be useful to also simply report the expected coverage based on the number of reads generated in the experiment. The difference (exp - obs) will give some clues about how well the experiment went.
Basically, expected coverage would be (in a WES/target-seq setting):
R: read length
L: target size
N: number of reads
Expected mean coverage = N*R/L
Exp - Obs
quickly gives an idea of how much information we lost (due to duplicate reads, off-target mapped reads, ... ). I can get these numbers from other sources but I was just wondering if these can be an easy by-product of Mosdepth tool. This would allow kind of a quick QC feature in a way. But I can understand that this might not be the philosophy of the tool.
I agree this would be useful. I'll think about how to expose and implement.
This is not off my radar, I'm still considering what to add. I'd also like to output mean/s.d. by chromosome so that a user could get a z-score as needed.
this could probably be added to the summary output...