doppelmark icon indicating copy to clipboard operation
doppelmark copied to clipboard

ERROR: Could not parse name

Open xiekunwhy opened this issue 3 years ago • 4 comments

Hi,

I got the following ERROR when using doppelmark(latest binary version) to deal with a ~210Gb bam (doppelmark --bam HX.clean.bam -output HX.dedup1.bam -parallelism 10 --clip-padding=1000 -scratch-dir tmp1 -disk-mate-shards 1000): ... I0406 06:51:53.611818 56185 mark_duplicates.go:855] shard[43689] info: &{{ 0 2147483647 0 0 0 43689} 0 0 1899883503 1899883503} E0406 06:51:56.743854 56185 optical_detector.go:124] Could not parse name: E100007937L1C015R0342816153, expected 5, 7, or 8 fields separated by ':'

Any one knows why?

Best, Kun

xiekunwhy avatar Apr 06 '21 07:04 xiekunwhy

Try using the command line argument --optical-distance=-1

Also, is there a reason you're setting --clib-padding=1000 ? How long are your reads?

yipal avatar Apr 06 '21 14:04 yipal

Hi @yipal ,

The software worked well after using --optical-distance=-1.

The read length I am using is 150bp, but I got 5' alignment distance(150) exceeds padding(143) when using default value and got 5' alignment distance(180) exceeds padding(152) when using 152. So I use a extremely large value, will this value affect results?

Best, Kun

xiekunwhy avatar Apr 06 '21 16:04 xiekunwhy

Setting the clip-padding to 1000 should not cause wrong results, but it will cost you in computational efficiency. I'm confused to how you have 5' alignment distance of 180 when your read length is 150. Could you share the read that causes the clip-padding error?

yipal avatar Apr 07 '21 15:04 yipal

Hi @yipal

Finally, doppelmark told me that the largest value is 219 (2nd line in metrics file)

bio-mark-duplicates

maximum 5' alignment distance: 219

I really don't know why, gap open when mapping?

I don't know how to extract such strange reads quickly from a large bam file, do you have any suggestion?

Best, Kun

xiekunwhy avatar Apr 08 '21 01:04 xiekunwhy