lassosum
lassosum copied to clipboard
how to use result of lassosum to plink
Hi, When I run the example of lassosum, it is difficult for me to get the same result from lassosum and plink. lassosum code is:
library(lassosum)
setwd(system.file("data", package="lassosum"))
ss <- fread("summarystats.txt")
ref.bfile <- "refpanel"
test.bfile <- "testsample"
LDblocks <- "EUR.hg19"
cor <- p2cor(p = ss$P_val, n = 60000, sign=log(ss$OR_A1))
out <- lassosum.pipeline(cor=cor, chr=ss$Chr, pos=ss$Position,
A1=ss$A1, A2=ss$A2, # A2 is not required but advised
ref.bfile=ref.bfile, test.bfile=test.bfile,
LDblocks = LDblocks)
set.seed(20190822)
pheno <- rnorm(nrow.bfile(out$test.bfile))
v <- validate(out, pheno=pheno)
beta <- v$best.beta
idx_eff <- which(beta != 0) ##lasso zero
idx <- out$sumstats$order[idx_eff]
beta_dat <- data.frame(ss[idx, 1], ss[idx, 3], ss[idx, 4], beta[idx_eff])
test_bim <- data.frame(fread("testsample.bim"))
test_snp <- test_bim[match(beta_dat[, 2], test_bim[, 4]), 2]
beta_dat2 <- data.frame(test_snp, beta_dat[, 3], beta_dat[, 4])
write.table(beta_dat2, file = "eff.txt", col.names = F, row.names = F, quote = F)
plink code is:
cd your/lassosum/datapath/
plink-1.9 --bfile testsample --score eff.txt 1 2 3 sum --out pheno_testsample
Result of lassosum:
head(v$beta.pgs)
-0.418 0.46 -0.451 0.165 0.102
Result of plink:
FID IID PHENO CNT CNT2 SCORESUM
0 HG00096 -9 1634 730 -0.307637
0 HG00097 -9 1634 748 -0.102278
0 HG00099 -9 1634 761 -0.202595
0 HG00100 -9 1634 744 -0.0630223
0 HG00101 -9 1634 734 -0.221991
0 HG00102 -9 1634 740 -0.139318
0 HG00103 -9 1634 716 -0.286505
0 HG00105 -9 1634 722 -0.221223
0 HG00106 -9 1634 746 -0.237664
Could you tell me the mistake? Thanks! Best, Sheng
Following on this thread. If developers can support output the adjusted beta, that would be the best. We can score by plink for support new file format or even dosage format.
Best regards Wallace
lassosum and plink are not doing the same thing so I don't understand why you would expect the same result from lassosum and plink.
What do you mean by adjusted beta? From your code, you have generated beta by beta <- v$best.beta. That is correct. Furthermore, out$keep.test will give you a logical vector to indicate which SNPs in the test .bed file map to the betas.
lassosum and plink are not doing the same thing so I don't understand why you would expect the same result from lassosum and plink.
We don't expect the plink and lassosum do the same thing. We just want to use plink's scoring function to use lassosum's adjusted betas. Only the last step.
Thanks much for the prompt reply.
Best regards Wallace
Can you explain what you mean by "adjusted beta" and how it is supposed to differ from "beta"?
Will the lassosum take in the GWAS summary (univariate beta) and LD structure between snps to update the univariate beta to a new beta (adjusted beta) to control LD between SNPs. And then time the new beta (adjusted beta) with effective allele dosage (0/1/2) to have the PRS? Apologize if my understanding is wrong.
Then our purpose is to use the new beta (adjusted beta) to calculate individual PRS by plink other than by lassosum, picking up the best parameter combination by lassosum is OK.
Best regards Wallace
If X denote the test genotype matrix, then v$best.pgs = X %*% v$best.beta, which is the "adjusted beta" you described. It could be different from what you get from plink because, (1) plink scales the PGS after X %*% v$best.beta (according the number of SNPs used), (2) the handling of missing genotype in plink maybe different. Try specifying --fill-missing-a2 in plink, and it should have a correlation of nearly 1 with v$best.pgs.
What I understand from the question:
v$beta.pgsis already the "adjusted betas" (meaning directly usable on 0/1/2 genotypes?)- use these effects to compute individual scores with PLINK
- individual scores computed with PLINK are not the same as
v$best.pgs(correlation of 0.573) - PLINK option
--fill-missing-a2does not work with--score(and it does not seem there is missing data in this example data)
@privefl, I observed the same thing a while ago as you did.
Best regards Wallace
I use the best.pgs from validate. Then I calculate the R2 and MSE between predicted phenotype and testing phenotype. R2 looks good, but MSE is bias.
Best,
Sheng
Hi,
Following on the same thread, would it be possible to export the "v$best.beta" along with their corresponding SNP ids, a1 and a2 alleles? I managed to export the adjusted betas and the SNPids, however I'm unsure about the a1 that is used by lassosum. Does it swap the alleles while adjusting the betas? If yes, is there a way to export it too?
Best, Dheeraj.
Hi, When I use the v0.4.4, the bug is fixed. Both mse and r2 looks good. Thanks!
Best, Sheng
@dheerajbobbili1988 ,
Yes. If out is the output from the lassosum.pipeline() function, then the out$sumstats object is a data.frame which gives the details of the SNPs used in v$best.beta. You can simply concatenate the two, e.g. with, sumstats <- cbind(out$sumstats, v$best.beta).
Incidentally, if you know the A1,A2 of the bfile is the same as that in test.bfile, you can use pgs(test.bfile, v$best.beta) to generate the PGS also. The pgs function also supports parallel processing using the cluster option.
Yes, it seemed to work. Thanks a lot!!
I'm trying to export the the final pgs using the option: pgs(out$test.bfile, target.res$best.beta)
However, I'm getting the following error. How to resolve this? TIA
Error in .pgs.default(weights = weights, bfile = bfile, ...) : Number of rows in (or vector length of) weights does not match number of selected columns in bfile