mosdepth icon indicating copy to clipboard operation
mosdepth copied to clipboard

Exclude BED regions even with thresholds enabled

Open torfinnnome opened this issue 5 years ago • 5 comments

I align my reads against a genome with >232k scaffolds, but am only interested in coverage reports of a few of these scaffolds, and this causes mosdepth to eventually being killed because it runs out of memory. Attached is a suggested patch, which seems to work for my case at least.

--no-per-base with --by file.bed and -T seems to be causing this.

torfinnnome avatar Aug 20 '18 09:08 torfinnnome

I think that solution will cause other problems. can you share a bam I can use to recreate?

brentp avatar Aug 20 '18 13:08 brentp

Here's a small example: https://www.dropbox.com/s/u4q8or76t81fr56/mosdepth-testset.tar.gz?dl=0

Job took 1h27 min with peak memory usage of ~75 GB.

torfinnnome avatar Aug 22 '18 06:08 torfinnnome

@torfinnnome I think this is related to another, recently reported issue. Do you see huge memory use if you do not specify any threads?

brentp avatar Feb 19 '19 18:02 brentp

Without "-t" memory usage drops from ~75G to ~2G. However, it still takes ~1h30m.

torfinnnome avatar Feb 22 '19 08:02 torfinnnome

sorry for leaving this. I don't even have a good excuse.

your solution does affect the output in some cases, but I think it's a reasonable compromise that if you have --no-per-base, you won't get all the output. I'll dig into this a bit more. I am re-creating with:

echo -e "chr1\t0\t100000000" > t.bed
(echo -e "@HD\tVN:1.5\tGO:none\tSO:coordinate";
N=200
awk -v n=$N 'BEGIN{for(i=0;i<n;i++){ printf("@SQ\tSN:chr%d\tLN:100000000\n", i) }}'
echo -e "@RG\tID:sample\tSM:sample";
awk -v n=$N 'BEGIN{for(i=0;i<n;i++){
printf("read%d\t161\tchr%d\t713109\t7\t74M\tchr%d\t20375284\t10000\tACCACGGCCAGCTAATTTTTGGATTTTTTGTAGAGACTGGGTTTCACCATGGCCAGGCTGGTCTCGCACTCCTG\t;?D7DD)@CDDD>?+,<C9CE1*:?*1?D0?DB9D<B@9?##################################\tMC:Z:74M\tMD:Z:21T43C0A7\tRG:Z:sample\tNM:i:3\tMQ:i:0\tAS:i:60\tXS:i:55\nread%d\t81\tchr%d\t20375284\t7\t74M\tchr%d\t713109\t-10000\tACCACGGCCAGCTAATTTTTGGATTTTTTGTAGAGACTGGGTTTCACCATGGCCAGGCTGGTCTCGCACTCCTG\t;?D7DD)@CDDD>?+,<C9CE1*:?*1?D0?DB9D<B@9?##################################\tMC:Z:74M\tMD:Z:21T43C0A7\tRG:Z:sample\tNM:i:3\tMQ:i:0\tAS:i:60\tXS:i:55\n", i, i, i, i, i, i)
}}'

 ) | samtools view -b -o t.bam && samtools index t.bam

 ls -lh t.bam
 echo "running mosdepth"
time ../mosdepth --no-per-base --by t.bed after ./t.bam -T 1,5,10,20,50,100,250,500,1000,5000,10000,50000

brentp avatar Aug 22 '19 18:08 brentp