pandas-plink
pandas-plink copied to clipboard
Genotype code wrong!
Hi there,
I was surprised to find a major error in your pandas-plink package! I used it to read /bed/bim/fam files into python. But I found that my minor alleles homozygous were coded as 0 and major alleles homozygous were coded as 2!!! That invests all my results! Can you please check your source code to correct such an error! Otherwise, it's really dangerous for others to continue using this pandas-plink package!!!
Hi @liqingbioinfo also found this. You can however fix it using ref="a0"
:
from os.path import join
from pandas_plink import read_plink1_bin
from pandas_plink import get_data_folder
G = read_plink1_bin(join(get_data_folder(), "chr*.bed"), verbose=False, ref="a0")
G[0:5, 0:5].compute() # the first 5x5
Which should get you:
array([[2., 2., 0., 2., 2.],
[2., 1., 0., 2., 2.],
[2., 2., 0., 1., 2.],
[2., 2., 0., 2., 2.],
[2., 2., 0., 2., 2.]], dtype=float32)
As opposed to using ref="a1"
(which is the default):
array([[0., 0., 2., 0., 0.],
[0., 1., 2., 0., 0.],
[0., 0., 2., 1., 0.],
[0., 0., 2., 0., 0.],
[0., 0., 2., 0., 0.]], dtype=float32)
This is also stated in the docs:
ref (str) – Reference allele. Specify which allele the dosage matrix will count. It can be either "a1" (default) or "a0".