lassosum icon indicating copy to clipboard operation
lassosum copied to clipboard

how to use result of lassosum to plink

Open biostat0903 opened this issue 6 years ago • 16 comments

Hi, When I run the example of lassosum, it is difficult for me to get the same result from lassosum and plink. lassosum code is:

library(lassosum)
setwd(system.file("data", package="lassosum")) 
ss <- fread("summarystats.txt")
ref.bfile <- "refpanel"
test.bfile <- "testsample"
LDblocks <- "EUR.hg19"
cor <- p2cor(p = ss$P_val, n = 60000, sign=log(ss$OR_A1))
out <- lassosum.pipeline(cor=cor, chr=ss$Chr, pos=ss$Position, 
                         A1=ss$A1, A2=ss$A2, # A2 is not required but advised
                         ref.bfile=ref.bfile, test.bfile=test.bfile,
                         LDblocks = LDblocks)
set.seed(20190822)
pheno <- rnorm(nrow.bfile(out$test.bfile))
v <- validate(out, pheno=pheno)
beta <- v$best.beta
idx_eff <- which(beta != 0) ##lasso zero
idx <- out$sumstats$order[idx_eff]
beta_dat <- data.frame(ss[idx, 1], ss[idx, 3], ss[idx, 4], beta[idx_eff])
test_bim <- data.frame(fread("testsample.bim"))
test_snp <- test_bim[match(beta_dat[, 2], test_bim[, 4]), 2]
beta_dat2 <- data.frame(test_snp, beta_dat[, 3], beta_dat[, 4])
write.table(beta_dat2, file = "eff.txt", col.names = F, row.names = F, quote = F)

plink code is:

cd your/lassosum/datapath/
plink-1.9 --bfile testsample --score eff.txt 1 2 3 sum --out pheno_testsample

Result of lassosum:

head(v$beta.pgs)
-0.418 0.46 -0.451 0.165 0.102

Result of plink:

 FID       IID  PHENO    CNT   CNT2 SCORESUM
   0   HG00096     -9   1634    730 -0.307637
   0   HG00097     -9   1634    748 -0.102278
   0   HG00099     -9   1634    761 -0.202595
   0   HG00100     -9   1634    744 -0.0630223
   0   HG00101     -9   1634    734 -0.221991
   0   HG00102     -9   1634    740 -0.139318
   0   HG00103     -9   1634    716 -0.286505
   0   HG00105     -9   1634    722 -0.221223
   0   HG00106     -9   1634    746 -0.237664

Could you tell me the mistake? Thanks! Best, Sheng

biostat0903 avatar Aug 22 '19 16:08 biostat0903

Following on this thread. If developers can support output the adjusted beta, that would be the best. We can score by plink for support new file format or even dosage format.

Best regards Wallace

wavefancy avatar Sep 10 '19 20:09 wavefancy

lassosum and plink are not doing the same thing so I don't understand why you would expect the same result from lassosum and plink.

tshmak avatar Sep 11 '19 02:09 tshmak

What do you mean by adjusted beta? From your code, you have generated beta by beta <- v$best.beta. That is correct. Furthermore, out$keep.test will give you a logical vector to indicate which SNPs in the test .bed file map to the betas.

tshmak avatar Sep 11 '19 02:09 tshmak

lassosum and plink are not doing the same thing so I don't understand why you would expect the same result from lassosum and plink.

We don't expect the plink and lassosum do the same thing. We just want to use plink's scoring function to use lassosum's adjusted betas. Only the last step.

Thanks much for the prompt reply.

Best regards Wallace

wavefancy avatar Sep 11 '19 02:09 wavefancy

Can you explain what you mean by "adjusted beta" and how it is supposed to differ from "beta"?

tshmak avatar Sep 11 '19 08:09 tshmak

Will the lassosum take in the GWAS summary (univariate beta) and LD structure between snps to update the univariate beta to a new beta (adjusted beta) to control LD between SNPs. And then time the new beta (adjusted beta) with effective allele dosage (0/1/2) to have the PRS? Apologize if my understanding is wrong.

Then our purpose is to use the new beta (adjusted beta) to calculate individual PRS by plink other than by lassosum, picking up the best parameter combination by lassosum is OK.

Best regards Wallace

wavefancy avatar Sep 11 '19 13:09 wavefancy

If X denote the test genotype matrix, then v$best.pgs = X %*% v$best.beta, which is the "adjusted beta" you described. It could be different from what you get from plink because, (1) plink scales the PGS after X %*% v$best.beta (according the number of SNPs used), (2) the handling of missing genotype in plink maybe different. Try specifying --fill-missing-a2 in plink, and it should have a correlation of nearly 1 with v$best.pgs.

tshmak avatar Sep 12 '19 01:09 tshmak

What I understand from the question:

  • v$beta.pgs is already the "adjusted betas" (meaning directly usable on 0/1/2 genotypes?)
  • use these effects to compute individual scores with PLINK
  • individual scores computed with PLINK are not the same as v$best.pgs (correlation of 0.573)
  • PLINK option --fill-missing-a2 does not work with --score (and it does not seem there is missing data in this example data)

privefl avatar Sep 12 '19 04:09 privefl

@privefl, I observed the same thing a while ago as you did.

Best regards Wallace

wavefancy avatar Sep 12 '19 07:09 wavefancy

I use the best.pgs from validate. Then I calculate the R2 and MSE between predicted phenotype and testing phenotype. R2 looks good, but MSE is bias. Best, Sheng

biostat0903 avatar Jan 28 '20 18:01 biostat0903

Hi,

Following on the same thread, would it be possible to export the "v$best.beta" along with their corresponding SNP ids, a1 and a2 alleles? I managed to export the adjusted betas and the SNPids, however I'm unsure about the a1 that is used by lassosum. Does it swap the alleles while adjusting the betas? If yes, is there a way to export it too?

Best, Dheeraj.

dheerajbobbili1988 avatar Jul 22 '20 20:07 dheerajbobbili1988

Hi, When I use the v0.4.4, the bug is fixed. Both mse and r2 looks good. Thanks!

Best, Sheng

biostat0903 avatar Aug 12 '20 18:08 biostat0903

@dheerajbobbili1988 ,

Yes. If out is the output from the lassosum.pipeline() function, then the out$sumstats object is a data.frame which gives the details of the SNPs used in v$best.beta. You can simply concatenate the two, e.g. with, sumstats <- cbind(out$sumstats, v$best.beta).

tshmak avatar Aug 13 '20 01:08 tshmak

Incidentally, if you know the A1,A2 of the bfile is the same as that in test.bfile, you can use pgs(test.bfile, v$best.beta) to generate the PGS also. The pgs function also supports parallel processing using the cluster option.

tshmak avatar Aug 13 '20 02:08 tshmak

Yes, it seemed to work. Thanks a lot!!

dheerajbobbili1988 avatar Aug 13 '20 22:08 dheerajbobbili1988

I'm trying to export the the final pgs using the option: pgs(out$test.bfile, target.res$best.beta)

However, I'm getting the following error. How to resolve this? TIA

Error in .pgs.default(weights = weights, bfile = bfile, ...) : Number of rows in (or vector length of) weights does not match number of selected columns in bfile

Mahantesh-Biradar avatar Aug 16 '21 12:08 Mahantesh-Biradar