doppelmark
doppelmark copied to clipboard
ERROR: Could not parse name
Hi,
I got the following ERROR when using doppelmark(latest binary version) to deal with a ~210Gb bam (doppelmark --bam HX.clean.bam -output HX.dedup1.bam -parallelism 10 --clip-padding=1000 -scratch-dir tmp1 -disk-mate-shards 1000):
...
I0406 06:51:53.611818 56185 mark_duplicates.go:855] shard[43689] info: &{{
Any one knows why?
Best, Kun
Try using the command line argument --optical-distance=-1
Also, is there a reason you're setting --clib-padding=1000 ? How long are your reads?
Hi @yipal ,
The software worked well after using --optical-distance=-1.
The read length I am using is 150bp, but I got 5' alignment distance(150) exceeds padding(143) when using default value and got 5' alignment distance(180) exceeds padding(152) when using 152. So I use a extremely large value, will this value affect results?
Best, Kun
Setting the clip-padding to 1000 should not cause wrong results, but it will cost you in computational efficiency. I'm confused to how you have 5' alignment distance of 180 when your read length is 150. Could you share the read that causes the clip-padding error?
Hi @yipal
Finally, doppelmark told me that the largest value is 219 (2nd line in metrics file)
bio-mark-duplicates
maximum 5' alignment distance: 219
I really don't know why, gap open when mapping?
I don't know how to extract such strange reads quickly from a large bam file, do you have any suggestion?
Best, Kun