facets
facets copied to clipboard
Get the log2R for each segment from facet result
Hi,
I am using FACETS to estimate the ASCNV for my WES data. I can get the absolute copy number for each segment now, but I want to call recurrent CNV by GISTIC then, which required a log2 ratio for each segment. Could you tell me how to get the logR information from facets result? Thanks!
Below is my code for running FACETs.
library(facets)
input_dir <- '/data1/XiLab/shiyang/data/ICC_103/ICC_103_snp_pileup_1/'
output_dir <- '/data1/XiLab/shiyang/data/ICC_103/ICC_103_facets_result_1/'
files <- dir(input_dir)
for (file in files){
path <- paste(input_dir, file, sep = '')
case <- strsplit(file, '\\.')[[1]][1]
pre_data <- preProcSample(file = path)
data <- procSample(pre_data, cval = 150)
fit_data <- emcncf(data)
pur <- fit_data$purity
plo <- fit_data$ploidy
tmp_df <- data.frame(sample = case, purity = pur, ploidy = plo)
seg_df <- data.frame(fit_data$start, fit_data$end, fit_data$cncf)
write.table(seg_df, paste(output_dir, case, '.seg_info.txt', sep = ''), sep = '\t', row.names = FALSE)
}
The cnlr.median for each segment is the relevant segment log-ratio value.
Thank you!
Hi Venkat,
adding on this, can we really use these "raw" cnlr.median values ? For example, in order to see the concordance of CN between 2 matched samples. Indeed, in many samples, the dipLogR value is not at zero. Shouldn't we thus correct each cnlr.median for diplogr in order to have comparable data ?
Thanks for your valuable input! Best,
Cedric
Hi Cedric,
cnlr.median is the segment level summary of observed data. Correcting it for dipLogR is a one line code where as once corrected it no longer is the summary of the observed data. Also corrected values don't give you comparable values for two samples since the sample purity also has to be accounted for. It is best to compare the estimated copy numbers.
Thanks, Venkat
Hello,
I wanted to generate segmentation files for GISTIC as well. I am seeing that there are conflicting answers to this https://github.com/mskcc/facets/issues/84#issuecomment-392079533. Please correct me if I am wrong. Do we need to subtract dipLogR from cnlr.median for GISTIC analysis across a cohort?
Thanks.
I don't see a conflict. Please use cnlr.median - dipLogR
In the #84 (comment). comment, "(cnlr.median - dipLogR) is the log(total copy number) unadjusted for purity which is what you want", "the log tcn values, which corresponds to also adjusting for the purity", and the above comment "cnlr.median is the segment level summary of observed data". Did you mean:
- cnlr.median is raw segment level which has not been adjusted for purity and ploidy ?
- cnlr.median - dipLogR is only adjusted for ploidy ?
- log tcn values is adjusted for both purity and ploidy ? Sorry, I have been confused with these values. Please correct me if I have made a mistake. Then, which value should be used for GISTIC 2.0 ?
Thank you very much.