superFreq
superFreq copied to clipboard
Retrieve ploidy fit value / - How to compute Chromosome Instability (CIN) Index - Fraction Genome Altered (FGA) ?
Two questions :
Is there a text file with the ploïdy written or the ploïdy fit only stands in the diagnostic plot ?
I was wondering if it was possible to derivate some Chromosome Instability (CIN) Index - Fraction Genome Altered (FGA) from the ouput CNAsegments_*.tsv.
From CINmetrics package , FGA would be easy to compute. The idea is to filter by M values |M| >=0.2 , maybe also the length of the event could be useful to filter cna. We sum the lengths and divide by the genome size when WGS.
Here, with rnaseq, using a large cohort, I would divide by the total sum of cnv lentgth accross all patients. I would do this separately for tumor and normal samples. What do you think ?
Another way would be to integrate the range of variation in the M values by the product of M * length cna.
This approach could be applied per chromosome or at the genome scale to help to discover groups of samples with lower/higher level of chromosome rearrangement.
This is more a comment than a question in fact :) This soft is really a cool work.
Hi!
The ploidy isn't output explicitly, but it's easy enough to calculate from the segments, just the average copy number (so 2*2^M) over the segments, weighted by segment size (x2-x1). And I guess exclude X and Y chromosomes if you want normal to be ploidy 2.
Not an expert on chromosome instability measures, but what you suggest seems reasonable. By using M, you're not using the superFreq copy number calling or the uncertainty in M (the width column), and you're not sensitive to CNN LOH. But just straight up M with a size cut (around 10Mbp seems to be a good cut fore reliable segments) should be pretty robust. Maybe with a check on "width" as well, but I'd suggest you try some cuts, look manually at what barely gets through or barely doesnt get through, and then adjust cuts accordingly.
Things to look out for are ploidy as you mentioned, where an incorrect call would mess up the FGA score. Other thing is low quality samples (for any reason, low quality input cDNA, small library size, etc) tends to give more false calls. Apart from outlier samples with quality issues, if you analyse data from different sources there is a very real risk of batch effects related to systematic quality effects. Increasing size cut and/or M cut on what you include as copy number controls your sensitivity/accuracy so can be adjusted to mitigate quality related batch effects.
but seems you have a good idea of what to do, dont think I can add much more. Good luck!