canvas icon indicating copy to clipboard operation
canvas copied to clipboard

Bin sizes and .binned files

Open alvaralmstedt opened this issue 7 years ago • 5 comments

Hello!

I am trying to run canvas on Exome samples but the CNV values reported in the .seg files are very high. To normalise the values, I am thinking that I will need to add some unmatched controls.

When running “mono Canvas.exe Somatic-Enrichment -h”, I see that there are a few flags needed for this:

--control-bam=VALUE    Bam file of an unmatched control sample. Option
                               can be specified multiple times.
--control-binned=VALUE Canvas .binned file containing precomputed
                               control bin data to use for normalization
--control-bin-size=VALUE
                             bin size for control .binned file
--control-ploidy-bed=VALUE
                             .bed file containing regions of known ploidy for
                               the control .binned file

How is this .binned file created? How do I calculate the bin size? How is this .bed file made or what should it contain? Manually from the manifest file?

Thanks!

alvaralmstedt avatar Aug 14 '17 10:08 alvaralmstedt

The control .binned file, bin size would be generated from running Canvas with the --control-bam file option. Rather than recomputing .binned and bin size for each subsequent analysis run, the same control .binned and corresponding bin size can be provided directly.

For your initial analysis you can just provide a bam file for the control sample using the --control-bam. The --control-ploidy-bed file will affect normalization in regions that are known to not be diploid (e.g male X excluding PAR regions). It is important when using a female control with a male test sample or vice versa. If you don't have any targeted regions on X/Y you can skip providing the ploidy bed file.

eroller avatar Aug 14 '17 17:08 eroller

Thank you for the response!

Follow-up question: Would this --control-ploidy-bed file just essentially be (very simplified):

Chromosome    Start    End
chrX                  1           length(chrX)    
chrY                  1           length(chrY)
chrM                  1           length(chrM)

for males and the same minus chrX for females to mark these chromosomes as non-diploid. I think the Pseudoautosomal regions are accounted for in the main exome manifest file.

Or should I take out all the regions from these chromosomes from the exome manifest file I am using, convert them to bed and use that as the control ploidy bed?

Hope that made sense

alvaralmstedt avatar Aug 15 '17 08:08 alvaralmstedt

The ploidy bed file specifies the non-diploid regions and also the reference CN for these regions. Correct format for a male sample using hg19 coordinates is:

##ReferenceSexChromosomeKaryotype=XY
chrX	0	10001	ploidy	1
chrX	10001	2781479	ploidy	2
chrX	2781479	155701383	ploidy	1
chrX	155701383	156030895	ploidy	2
chrX	156030895	156040895	ploidy	1
chrY	0	57227415	ploidy	1

The CN=2 regions are included for completeness but can be removed since the default is always diploid. I don't see how the manifest file will be able to provide the correct ploidy for the PAR regions. The manifest file simply provides the regions you are interested in calling. You do not need to include chrM since we don't call on mito.

eroller avatar Aug 15 '17 16:08 eroller

Thank you! I think I understand a bit better now.

Would I be correct in assuming that I do not need a bed file for female samples, as there would be no haploid regions that need to be filtered out?

alvaralmstedt avatar Aug 16 '17 09:08 alvaralmstedt

Hello again Eric!

I have generated .binned and .binsize files for 11 exome samples where no disease-causing CNV:s have been found.

I am a bit confused again as to how I am supposed to supply these control files to subsequent samples we wish to analyse. The —control-bin-size flag does not seem to want to accept either the .binsize file nor the value contained within it:

Error parsing control-bin-size option: failed to convert 
/medstore/Alvar_Almstedt/canvas_related/control_samples/binsize/female/G45/G45.binsize to 
System.Nullable`1[System.UInt32]

Also, is it possible to supply all the .binned files to each sample or only a single one? If multiple, how do I select which .binsizes belong to which .binned files? Are the bams still required as input when using .binned input?

Is this information available in the documentation somewhere and I have missed it?

Thank you!

alvaralmstedt avatar Aug 28 '17 13:08 alvaralmstedt