maftools icon indicating copy to clipboard operation
maftools copied to clipboard

readGistic: possible parsing errors in case there is a single significant peak

Open lbeltrame opened this issue 5 years ago • 5 comments

I have a GISTIC result where only one significant peak was found. When reading the GISTIC file, the parsing gets likely confused, because a number of warnings are written:

Processing Gistic files..
Warning message in melt.data.table(all.lesions, id.vars = "cytoband"):
“'measure.vars' [S115-ready, S126-ready, S128-ready, S132-ready, ...] are not all of the same type. By order of hierarchy, the molten data value column will be of type 'double'. All measure variables not of type 'double' will be coerced too. Check DETAILS in ?melt.data.table for more on coercion.”Processing XXX.amp_genes.conf_99.txt..
Processing XXX.del_genes.conf_99.txt..
Warning message in data.table::fread(input = gisticDelGenesFile, stringsAsFactors = FALSE, :
“Discarded single-line footer: <<genes in wide peak>>”Processing XXX.scores.gistic..
Summarizing samples..

And then gisticOncoPlot gets utterly confused on what to plot:

Screenshot_20190502_104824

(note how rows are samples, and not peaks).

lbeltrame avatar May 02 '19 08:05 lbeltrame

Oh, I never accounted for cases like this. I will take a look at it soon.

PoisonAlien avatar May 02 '19 09:05 PoisonAlien

Hi, I run your code: BLCA01laml.gistic = readGistic(gisticAllLesionsFile = all.lesions, gisticAmpGenesFile = amp.genes, gisticDelGenesFile = del.genes, gisticScoresFile = scores.gis, isTCGA = TRUE) [email protected] [email protected] and saw these results: image image It confused me. What did the column "Amp" "Del" "total" refer to? The times of copy number variation of the gene or sample? Thank you for your answer!

DrZhaoJie avatar Jan 02 '21 02:01 DrZhaoJie

Hi @DrZhaoJie,

  1. CNV.summary represents the number of genes that are amplified/deleted in each sample.
  2. gene.summary represents the number of amp/del events reported per gene. The output is a bit confusing because of the way GISTIC reports them. The thing is GISTIC outputs are not really the best formatted. Sometimes the same genes are repeated for the same cytoband in the output files which causes the increased number of events associated with that gene. You can investigate this issue by opening one of the del_genes.conf_XX or amp_genes.conf_99 and searching for one of the above genes. For example, MIR1244-1 us deleted in 306 samples but because of its multiple occurrences, it gets overrepresented.

PoisonAlien avatar Jan 02 '21 12:01 PoisonAlien

Wow! Thank you very much! Your reply really clarifies my confusion!!! Another question I'd like to ask. I investigated CNV and calculated the CNV frequency of all genes with "the altered samples" data in > gene.summary. What other calculation may be meaningful to measure the effect of CNV? Thank you very much!

DrZhaoJie avatar Jan 02 '21 13:01 DrZhaoJie

The other calculation and effect of CNV are unclear here.

If you can obtain what you want in gene.summary, why are you looking for other calculations?

ShixiangWang avatar Jan 05 '21 08:01 ShixiangWang