CMplot icon indicating copy to clipboard operation
CMplot copied to clipboard

Stratified QQ

Open swvanderlaan opened this issue 6 years ago • 9 comments

Hi,

It would be great if you could add in a function to make stratified QQ plots. For instance stratified by bins of info-score (e.g. https://github.com/swvanderlaan/MetaGWASToolKit/blob/master/SCRIPTS/plotter.qq_by_info.R) and minor allele frequency (e.g. https://github.com/swvanderlaan/MetaGWASToolKit/blob/master/SCRIPTS/plotter.qq_by_caf.R). These are great diagnostic tools to review which the best filtering settings are for the data.

Best,

Sander

swvanderlaan avatar Sep 24 '19 08:09 swvanderlaan

Hi Sander,

Thank you for your suggestion, i saw your well-written script, it is a good reference to follow up, I will have a try and achieve it in CMplot with your permission.

regards, Lilin

YinLiLin avatar Sep 25 '19 06:09 YinLiLin

Thanks for the compliment 👨🏻‍💻😁

That would be great. Please go ahead...

Do you have a timeline? It would be great if you could add ... I might simply switch to your package ...

swvanderlaan avatar Oct 08 '19 11:10 swvanderlaan

Any progress on this?

swvanderlaan avatar May 08 '20 15:05 swvanderlaan

Oh, very sorry for that. I missed your response here, apologise for it. I remembered that I checked your script, it seems that we need MAF or other Information to achieve it? am I right? if yes, it maybe a little hard to incorporate this function with CMplot, as CMplot only requires SNP Che Pos Pvalue1 Pvalue2 Pvalue3..., which are generally provided by lots of GWAS soft wares and can be easily prepared by users.

YinLiLin avatar May 10 '20 08:05 YinLiLin

I would argue that MAF and INFO are available for each GWAS. My idea would be to have stratified QQ plots including lambda's per bin (and potentially counts of variants), as a way to assess the raw results from GWAS prior to filtering them. Below an example.

image

These plots are quite informative.

You are right, that for final GWAS summary statistics from meta-analyses, INFO might not be available, but for every GWAS the MAF or CAF or EAF or AF should be available. And all GWAS softwares that I work with - SNPTEST and PLINK for instance - produce (raw) results with these variables.

It would be another type of course and yes, one would have to supply these data as a pre-requisite, plot_type = "qs" for instance.

Would be great.

Happy to help implement it - if you could help explaining a bit more what is what in the CMplot function... :-)

swvanderlaan avatar May 12 '20 09:05 swvanderlaan

Thanks for providing the examples, I agree with you, and the stratified QQplot using MAF information is definitely worth to have a try. To avoid breaking the structure of current data format, how about adding a parameter 'maf' in CMplot allowing users to input the MAF information? then we can use it to drew the figure1 you shown above.

YinLiLin avatar May 13 '20 01:05 YinLiLin

Yes that would be a great idea. I would go for two flags maf and info. So users would have to run it twice. And it would impede in the datastructuren - I believe. Because instead of maf you'd have info as an extra column.

swvanderlaan avatar Jun 09 '20 20:06 swvanderlaan

I found it can be achieved on the current version of CMplot when I was in process of tweaking the script, an example was shown below:

library(CMplot)
data(pig60K)

# maf generating
set.seed(123)
maf=0.001+0.45 * runif(nrow(pig60K))

# group assigning on basis of maf
p1=p2=p3=rep(NA, nrow(pig60K))
p1[maf<0.05]=pig60K$trait1[maf<0.05]
p2[maf<0.1&maf>=0.05]=pig60K$trait1[maf<0.1&maf>=0.05]
p3[maf>0.1]=pig60K$trait1[maf>0.1]
data=cbind(pig60K[,1:3], pig60K$trait1, p1, p2, p3)
colnames(data)[-c(1:3)]=c("All", "maf<0.05", "0.05=<maf<0.1","maf>=0.1")

# plot
CMplot(data, plot.type="q", multracks=T, conf.int=F)

The final visualised result: image

Is it consistent with the figures you mentioned above? That means we need to adjust the format of the data manually prior to plotting.

YinLiLin avatar Jun 26 '20 06:06 YinLiLin

Yes, this is a good start. Very good. Only thing lacking is the lambda per bin. That way you can assess whether there is inflation originating from that bin or not.

swvanderlaan avatar Jun 30 '20 22:06 swvanderlaan