mosdepth icon indicating copy to clipboard operation
mosdepth copied to clipboard

Recieving low distribution values from exome data

Open ghosholivia opened this issue 5 years ago • 11 comments

Hi @brentp ,

For 14 exome samples with approx 3.3GB exome BAM each, I am receiving low distribution values of depths from mosdepth tool compared to other tools such as Qualimap. [On average: 26% low value from the other tool] I have used the parameter mentioned in your example: mosdepth --by capture.bed sample-output sample.exome.bam For both the tool, capture.bed = 12MB

Can you guide me on why there's much difference in output?

Thanks, Olivia

ghosholivia avatar Oct 28 '20 10:10 ghosholivia

Hi Olivia, what file are you looking at that's showing low distribution values?

brentp avatar Oct 28 '20 13:10 brentp

I can likely help if you can start by answering this question ^

brentp avatar Oct 29 '20 16:10 brentp

Hi Olivia, what file are you looking at that's showing low distribution values?

Thanks for your reply. @brentp Those are hiseq-2000 human exome data of ~120 million reads.

Thanks, Olivia

ghosholivia avatar Oct 30 '20 05:10 ghosholivia

I mean what mosdepth file

brentp avatar Oct 30 '20 12:10 brentp

I mean what mosdepth file

The file “output.mosdepth.regions.dist.txt”. I ran the python script (plot.dist.py) on this file to get the mean coverage.

ghosholivia avatar Oct 30 '20 14:10 ghosholivia

ok. that's the right file. now can you clarify what you saw that was unexpected? by that, i mean, can you expand on this

I am receiving low distribution values of depths from mosdepth tool compared to other tools such as Qualimap. [On average: 26% low value from the other tool]

what value are you looking at from mosdepth and how did you get 26% and what is "low"?

brentp avatar Oct 30 '20 14:10 brentp

Yes. What I meant to say is the mean coverage I’m getting from mosdepth is ~26% lower than the value from Qualimap.

Whereas in WGS depth output, values are not varying as such.

I’m attaching one screenshot of the excel sheet with the differences mentioned above.

0E8A9240-1395-4E70-AD10-B598A37EFB44

ghosholivia avatar Oct 30 '20 16:10 ghosholivia

ah. ok. the coverage from mosdepth is 26% less than from qualimap. I don't know how qualimap works, but you could try running mosdepth with --fast-mode and see if the numbers match more closely.

If they do, that means that qualimap does not adjust for overlapping read-pairs (mosdepth does by default).

brentp avatar Oct 30 '20 17:10 brentp

ah. ok. the coverage from mosdepth is 26% less than from qualimap. I don't know how qualimap works, but you could try running mosdepth with --fast-mode and see if the numbers match more closely.

If they do, that means that qualimap does not adjust for overlapping read-pairs (mosdepth does by default).

Okay sure. I’ll try that parameter and check. Thanks a lot.

ghosholivia avatar Oct 30 '20 17:10 ghosholivia

ah. ok. the coverage from mosdepth is 26% less than from qualimap. I don't know how qualimap works, but you could try running mosdepth with --fast-mode and see if the numbers match more closely.

If they do, that means that qualimap does not adjust for overlapping read-pairs (mosdepth does by default).

Hi first of all great tool! I discovered it and it is very Swiss knife! I found me too the same behavior in comparison with samtools and pileup.sh (from BBMap) and --fast-mod "fix" the discrepancy indeed. Wondering which value is correct to be considered, with or without overlapping corrections? Thanks

aspitaleri avatar Nov 15 '20 20:11 aspitaleri

Depends what you mean by "correct". If you use --fast-mode then you are double-counting overlapping reads from the same fragment. So you got one piece of information (the fragment) and you counted it twice wherever the reads overlapped.

brentp avatar Nov 15 '20 22:11 brentp