cnvkit
cnvkit copied to clipboard
Start and end coordinates switched - is this expected?
Output of segmentrics
resulted in a call with start
> end
(cnvkit==0.9.9
):
Command:
cnvkit.py segmetrics -s <<output_from_segment_command>> <<output_from_fix_command>> -o <<output>> --ci --bootstrap 100
Result:
chromosome start end gene log2 depth probes weight
...
4 14258435 15448470 - -0.329143 0.000253774 1 0.998585
15 46657639 10488 - -0.41756 0 1 0
...
Is this to be expected? Should we include a post-processing step to filter out this 0-weight segment? Thank you for your advice.
That is not expected. It's also surprising that each of those segments has only 1 probe on it. Could you share the corresponding lines of the input .cns file?
Sure, here are the relevant lines:
From input cns file:
...
15 46657639 10488 - -0.41756000000000004 0.0 1 0.0 hmm-germline:0.0
...
Note that we concatenate the results of multiple segmenters into one input cns file and add a new column for method:weight
(in this case, hmm-germline with a weight of 0). This has worked for us in the past (we've recently upgraded from 0.8.5 to 0.9.9), and still continues to work, aside from this situation.
From cnr file:
chromosome start end gene log2 depth weight
1 10500 176917 Antitarget -9.84839 0 0.934803
1 318219 470868 Antitarget -9.04625 0 0.931417
1 570871 1221929 Antitarget -9.16632 0 0.99401
...
15 46657094 46657139 - -28.5108 0
15 46657639 47538217 Antitarget -0.41756 0.000222581 0.997043
15 47538217 48418796 Antitarget -7.24583 0 0.98933
15 48419296 48419340 SLC24A5 -28.7406 0
...
Thanks for the details. Since the segment has 0 weight I suppose it makes sense for you to add a post-processing step to remove 0-weight segments; it's not clear to me what the behavior should be when a segment and/or all data points supporting it are bogus, but scrambling the start and end coordinates is not particularly helpful. I'll see if I can figure out why this is happening internally.
Hi @etal,
Thanks for looking into this. I have another example of this issue that might be helpful:
Ratios:
chromosome start end gene log2 depth weight
...
13 94293168 94293212 GPC6 -6.12509 0.0227273 0.91648
13 94293212 94293257 GPC6 0.821666 1 0.919367
13 94293257 94293302 GPC6 0.812616 1 0.928625
13 94293302 94293347 GPC6 -0.193662 1 0.928721
...
Segments:
chromosome start end gene log2 depth probes weight method_weight
...
13 94293168 10488 - -1.1598 0.0 4 0.0 hmm-germline:0.0
...
We see that the segment 13:94293168-10488
uses four probes all at > 0.9 weight, just the corresponding end coordinate doesn't match the end coordinate of that last probe. Maybe 10488
represents some kind of internal code? This second case also occurs with hmm-germline
.