maftools
maftools copied to clipboard
readGistic: possible parsing errors in case there is a single significant peak
I have a GISTIC result where only one significant peak was found. When reading the GISTIC file, the parsing gets likely confused, because a number of warnings are written:
Processing Gistic files..
Warning message in melt.data.table(all.lesions, id.vars = "cytoband"):
“'measure.vars' [S115-ready, S126-ready, S128-ready, S132-ready, ...] are not all of the same type. By order of hierarchy, the molten data value column will be of type 'double'. All measure variables not of type 'double' will be coerced too. Check DETAILS in ?melt.data.table for more on coercion.”Processing XXX.amp_genes.conf_99.txt..
Processing XXX.del_genes.conf_99.txt..
Warning message in data.table::fread(input = gisticDelGenesFile, stringsAsFactors = FALSE, :
“Discarded single-line footer: <<genes in wide peak>>”Processing XXX.scores.gistic..
Summarizing samples..
And then gisticOncoPlot gets utterly confused on what to plot:
(note how rows are samples, and not peaks).
Oh, I never accounted for cases like this. I will take a look at it soon.
Hi, I run your code:
BLCA01laml.gistic = readGistic(gisticAllLesionsFile = all.lesions, gisticAmpGenesFile = amp.genes, gisticDelGenesFile = del.genes, gisticScoresFile = scores.gis, isTCGA = TRUE)
[email protected]
[email protected]
and saw these results:
It confused me. What did the column "Amp" "Del" "total" refer to? The times of copy number variation of the gene or sample? Thank you for your answer!
Hi @DrZhaoJie,
-
CNV.summary
represents the number of genes that are amplified/deleted in each sample. -
gene.summary
represents the number of amp/del events reported per gene. The output is a bit confusing because of the way GISTIC reports them. The thing is GISTIC outputs are not really the best formatted. Sometimes the same genes are repeated for the same cytoband in the output files which causes the increased number of events associated with that gene. You can investigate this issue by opening one of thedel_genes.conf_XX
oramp_genes.conf_99
and searching for one of the above genes. For example,MIR1244-1
us deleted in 306 samples but because of its multiple occurrences, it gets overrepresented.
Wow! Thank you very much! Your reply really clarifies my confusion!!! Another question I'd like to ask. I investigated CNV and calculated the CNV frequency of all genes with "the altered samples" data in > gene.summary. What other calculation may be meaningful to measure the effect of CNV? Thank you very much!
The other calculation
and effect of CNV
are unclear here.
If you can obtain what you want in gene.summary
, why are you looking for other calculations?