cnv_facets
cnv_facets copied to clipboard
How to generate Gistic2.0 input data from cnv_facets output
I get the vcf file form cnv_facets I use the WGS data I wonder How can I get the segment data from the vcf data especially the column of number of probe
Hi- Can you clarify your question perhaps adding an example of the data and what you want from it? In general, to parse vcf files you can use bcftools.
Thanks for your response. I will add a example. My raw data is from WGS and I run bwa for mapping and picard for marking duplicate. Then I used cnv_facets to call CNV I get the VCF data from cnv_facets the VCF file look like:
#CHROM POS ID REF ALT QUAL FILTER INFO chr1 13111 1 N <CNV> . PASS SVTYPE=DUP;SVLEN=126819;END=139929;NUM_MARK=223;NHET=4;CNLR_MEDIAN=0.056;MAF_R=-0.098;SEGCLUST=42;CNLR_MEDIAN_CLUST=0.031;MAF_R_CLUST=.;CF_EM=0.342;TCN_EM=3;LCN_EM=.;CNV_ANN=. chr1 158007 2 N <CNV> . PASS SVTYPE=DUP;SVLEN=3774146;END=3932152;NUM_MARK=8926;NHET=1569;CNLR_MEDIAN=0.184;MAF_R=-0.021;SEGCLUST=50;CNLR_MEDIAN_CLUST=0.155;MAF_R_CLUST=-0.005;CF_EM=0.295;TCN_EM=4;LCN_EM=2;CNV_ANN=. chr1 3932496 3 N <CNV> . PASS SVTYPE=DUP;SVLEN=1524406;END=5456901;NUM_MARK=4385;NHET=1059;CNLR_MEDIAN=0.088;MAF_R=-0.015;SEGCLUST=43;CNLR_MEDIAN_CLUST=0.102;MAF_R_CLUST=-0.006;CF_EM=0.248;TCN_EM=4;LCN_EM=2;CNV_ANN=.
Then I want to get segment data for Gistic2.0, I notice Gistic2.0 need a segmentationfile.txt file like this :
I learn the column are "sample", "chromosome" "start Position", "end Position", "number of markers in segment", "Seg.CN" I want to kown how to get the "number of markers in segment" and "Seg.CN" form the VCF files (which is generated by the cnv_facets)
I really appreciate your answer!
Hi,
You can use the value of 'NUM_MARK' for 'number of markers in segment', and 'CNLR.MEDIAN - dipLogR' for 'Seg.CN'.
More details at https://github.com/mskcc/facets/issues/84.
Hello @jamelee , I don't see a dipLogR field in the VCF file. so should we use 'CNLR.MEDIAN' as 'Seg.CN' for gistic input.