canvas
canvas copied to clipboard
Bin sizes and .binned files
Hello!
I am trying to run canvas on Exome samples but the CNV values reported in the .seg files are very high. To normalise the values, I am thinking that I will need to add some unmatched controls.
When running “mono Canvas.exe Somatic-Enrichment -h”, I see that there are a few flags needed for this:
--control-bam=VALUE Bam file of an unmatched control sample. Option
can be specified multiple times.
--control-binned=VALUE Canvas .binned file containing precomputed
control bin data to use for normalization
--control-bin-size=VALUE
bin size for control .binned file
--control-ploidy-bed=VALUE
.bed file containing regions of known ploidy for
the control .binned file
How is this .binned file created? How do I calculate the bin size? How is this .bed file made or what should it contain? Manually from the manifest file?
Thanks!
The control .binned file, bin size would be generated from running Canvas with the --control-bam file option. Rather than recomputing .binned and bin size for each subsequent analysis run, the same control .binned and corresponding bin size can be provided directly.
For your initial analysis you can just provide a bam file for the control sample using the --control-bam. The --control-ploidy-bed file will affect normalization in regions that are known to not be diploid (e.g male X excluding PAR regions). It is important when using a female control with a male test sample or vice versa. If you don't have any targeted regions on X/Y you can skip providing the ploidy bed file.
Thank you for the response!
Follow-up question: Would this --control-ploidy-bed file just essentially be (very simplified):
Chromosome Start End
chrX 1 length(chrX)
chrY 1 length(chrY)
chrM 1 length(chrM)
for males and the same minus chrX for females to mark these chromosomes as non-diploid. I think the Pseudoautosomal regions are accounted for in the main exome manifest file.
Or should I take out all the regions from these chromosomes from the exome manifest file I am using, convert them to bed and use that as the control ploidy bed?
Hope that made sense
The ploidy bed file specifies the non-diploid regions and also the reference CN for these regions. Correct format for a male sample using hg19 coordinates is:
##ReferenceSexChromosomeKaryotype=XY
chrX 0 10001 ploidy 1
chrX 10001 2781479 ploidy 2
chrX 2781479 155701383 ploidy 1
chrX 155701383 156030895 ploidy 2
chrX 156030895 156040895 ploidy 1
chrY 0 57227415 ploidy 1
The CN=2 regions are included for completeness but can be removed since the default is always diploid. I don't see how the manifest file will be able to provide the correct ploidy for the PAR regions. The manifest file simply provides the regions you are interested in calling. You do not need to include chrM since we don't call on mito.
Thank you! I think I understand a bit better now.
Would I be correct in assuming that I do not need a bed file for female samples, as there would be no haploid regions that need to be filtered out?
Hello again Eric!
I have generated .binned and .binsize files for 11 exome samples where no disease-causing CNV:s have been found.
I am a bit confused again as to how I am supposed to supply these control files to subsequent samples we wish to analyse. The —control-bin-size flag does not seem to want to accept either the .binsize file nor the value contained within it:
Error parsing control-bin-size option: failed to convert
/medstore/Alvar_Almstedt/canvas_related/control_samples/binsize/female/G45/G45.binsize to
System.Nullable`1[System.UInt32]
Also, is it possible to supply all the .binned files to each sample or only a single one? If multiple, how do I select which .binsizes belong to which .binned files? Are the bams still required as input when using .binned input?
Is this information available in the documentation somewhere and I have missed it?
Thank you!