Read LD from those 1kg_ldgm.*.bcf files
Dear Guilio:
I downloaded the 1kg_ldgm.*.bcf files for AFR, EAS, EUR, HIS, SAS.
Now let's say that I have a SNP list file including 100 SNPs within a LD block. How could I quickly extract the pair-wise LD matrix for these 100 SNPs, using bcftools and the 1kg_ldgm.*.bcf files?
Previously, I have been using plink2, with the following simple command:
plink2 --bfile 1kg.chr? --r --ld-snp-list my-snps.txt
Thank you & best regards, Jie
The matrices you downloaded are precision matrices rather than correlation matrices, that is, they are some regularized versions of the inverses of the correlation matrices. You would have to load the whole matrix and invert it to retrieve the correlation between two SNPs. Unfortunately there is no way to do so with BCFtools. You would have to code it yourself 😞
Dear Guilio:
The LDGM LD matrics are stored in files such as 1kg_ldgm.EUR.bcf. If we don't use BCFtools to read the .bcf files, what are we supposed to use then?
Right now, the popular LDSC software provides LD matrics in files named like 1.l2.ldscore.gz, and the popular PRS-CS software provides LD matrics in files named ldblk_1kg_chr1.hdf5.
In essence, the content of these files are simply raw values for LD r values. Then the software uses some library or customized code to extract them. Isn't LDGM LD files such as 1kg_ldgm.EUR.bcf storing the same LD r values and being used in a similar way?
It would be great if you could provide a simple example on what is " regularized versions of the inverses of the correlation matrices".
Best regards, Jie
A regularized version of the inverse of the correlation matrix is an extremely sparse and efficient representation of LD, which can achieve 1000x space savings. However what is stored are not LD r values. See here for a full explanation. You can peek into these files but ultimately they are meant to be used with BCFtools/blup and BCFtools/pgs. Currently there is no other software that can use them for other computations