cnvkit icon indicating copy to clipboard operation
cnvkit copied to clipboard

Different behaviour scatter shell shorthand

Open lvree opened this issue 1 year ago • 2 comments

I'm very new to CNV analysis and run across something I don't understand. As stated in the manual for the scatter command there are two options: image

I assumed they would produce the same output, as it was only a shorthand.

But the first command produces this plot: image

And the second produces this: image

What does explain this difference in behaviour, or am I misunderstanding something? And can someone explain what the difference in the plots means? Like, why does one have sort of lines and the other has a lot of points? Also, why does the first have it lowest point around -2.5 and the second below -3? I didn't do any manual scaling of the y-axis here yet, so this is just standard output

Then also another thing when I zoomed in on one chromosome the lowest point visible is at -4: image

But for the standard plot of the whole genome that isn't even in there, but when I lower the min size of the Y axis it is indeed in there: image

Why does it cut the graph off when there are still points there? Is it because there are no orange points there?

Thank you very much!

lvree avatar Oct 18 '24 12:10 lvree

Aah I'm sorry I see accidentally ran the call.cns file instead of the .cns file. But the output I got when using call.cns was the same as when I ran this part on my dataset: cnvkit.py batch *Tumor.bam --normal *Normal.bam
--targets my_baits.bed --annotate refFlat.txt
--fasta hg19.fasta --access data/access-5kb-mappable.hg19.bed
--output-reference my_reference.cnn --output-dir results/
--diagram --scatter

Further it says this: image So why is the batch command then using call.cns instead of .cns as is stated?

lvree avatar Oct 18 '24 13:10 lvree

The file .call.cns is based on the first .cns, with some additional processing:

  • Absolute copy number is inferred, and certain segments are marked as having neutral copy number. These segments are plotted in gray instead of orange, so that the non-neutral segments are more distinct.
  • The confidence interval of each segment's mean is checked for overlap with neutral copy number (log2 ratio 0.0); if it overlaps, the segment is merged with any neighboring segments that also have neutral copy number. (Something like that.) So the .call.cns may also have fewer segment breakpoints, and the remaining segments may have slightly different mean log2 ratio values.

etal avatar Nov 13 '24 06:11 etal