d4-format icon indicating copy to clipboard operation
d4-format copied to clipboard

Part size limit causing odd results every 10M bases?

Open onordesjo opened this issue 2 years ago • 0 comments

Hi, thanks for a great tool!

I'm seeing some issues in a d4 file (created with d4tools create -A basecalls.bam basecalls.d4).

There appear to be large flat regions in the d4 file every ~10M bases (apparently coinciding with the part size limit?)

It can be seen when running the following for example:

d4tools show basecalls.d4 chr1_MATERNAL | csvtk add-header -n  "ref,st,en,depth" | csvtk mutate2 -n diff -e '$en-$st' | csvtk filter2 -f '$diff > 100000'

ref     st      en      depth   diff
chr1_MATERNAL   0       560877  0       560877.00
chr1_MATERNAL   10000000        10657852        0       657852.00
chr1_MATERNAL   19999212        20671130        0       671918.00
chr1_MATERNAL   40000000        40199077        0       199077.00
chr1_MATERNAL   50000000        50549749        0       549749.00
chr1_MATERNAL   60000000        60798861        0       798861.00
chr1_MATERNAL   110000000       110292516       0       292516.00
chr1_MATERNAL   120000000       120302258       0       302258.00
chr1_MATERNAL   140000000       140279366       0       279366.00
chr1_MATERNAL   150000000       151804768       0       1804768.00
chr1_MATERNAL   170000000       170202385       0       202385.00
chr1_MATERNAL   179994048       180318995       0       324947.00
chr1_MATERNAL   190000000       190450433       0       450433.00
chr1_MATERNAL   220000000       220166024       0       166024.00

I have checked with samtools depth that I don't get unexpected outputs in these regions, so the bam appears to be well-formed.

Is there anything that can be done to achieve correct depth in these regions, by either increasing the part size, or by otherwise stitching things together?

onordesjo avatar Nov 03 '22 17:11 onordesjo