d4-format
d4-format copied to clipboard
Part size limit causing odd results every 10M bases?
Hi, thanks for a great tool!
I'm seeing some issues in a d4 file (created with d4tools create -A basecalls.bam basecalls.d4
).
There appear to be large flat regions in the d4 file every ~10M bases (apparently coinciding with the part size limit?)
It can be seen when running the following for example:
d4tools show basecalls.d4 chr1_MATERNAL | csvtk add-header -n "ref,st,en,depth" | csvtk mutate2 -n diff -e '$en-$st' | csvtk filter2 -f '$diff > 100000'
ref st en depth diff
chr1_MATERNAL 0 560877 0 560877.00
chr1_MATERNAL 10000000 10657852 0 657852.00
chr1_MATERNAL 19999212 20671130 0 671918.00
chr1_MATERNAL 40000000 40199077 0 199077.00
chr1_MATERNAL 50000000 50549749 0 549749.00
chr1_MATERNAL 60000000 60798861 0 798861.00
chr1_MATERNAL 110000000 110292516 0 292516.00
chr1_MATERNAL 120000000 120302258 0 302258.00
chr1_MATERNAL 140000000 140279366 0 279366.00
chr1_MATERNAL 150000000 151804768 0 1804768.00
chr1_MATERNAL 170000000 170202385 0 202385.00
chr1_MATERNAL 179994048 180318995 0 324947.00
chr1_MATERNAL 190000000 190450433 0 450433.00
chr1_MATERNAL 220000000 220166024 0 166024.00
I have checked with samtools depth that I don't get unexpected outputs in these regions, so the bam appears to be well-formed.
Is there anything that can be done to achieve correct depth in these regions, by either increasing the part size, or by otherwise stitching things together?