pandas-plink icon indicating copy to clipboard operation
pandas-plink copied to clipboard

Genotype code wrong!

Open liqingbioinfo opened this issue 2 years ago • 1 comments

Hi there,

 I was surprised to find a major error in your pandas-plink package! I used it to read /bed/bim/fam files into python. But I found that my minor alleles homozygous were coded as 0 and major alleles homozygous were coded as 2!!! That invests all my results! Can you please check your source code to correct such an error! Otherwise, it's really dangerous for others to continue using this pandas-plink package!!!

image

liqingbioinfo avatar Feb 26 '22 00:02 liqingbioinfo

Hi @liqingbioinfo also found this. You can however fix it using ref="a0":

from os.path import join
from pandas_plink import read_plink1_bin
from pandas_plink import get_data_folder

G = read_plink1_bin(join(get_data_folder(), "chr*.bed"), verbose=False, ref="a0")
G[0:5, 0:5].compute() # the first 5x5

Which should get you:

array([[2., 2., 0., 2., 2.],
       [2., 1., 0., 2., 2.],
       [2., 2., 0., 1., 2.],
       [2., 2., 0., 2., 2.],
       [2., 2., 0., 2., 2.]], dtype=float32)

As opposed to using ref="a1" (which is the default):

array([[0., 0., 2., 0., 0.],
       [0., 1., 2., 0., 0.],
       [0., 0., 2., 1., 0.],
       [0., 0., 2., 0., 0.],
       [0., 0., 2., 0., 0.]], dtype=float32)

This is also stated in the docs:

ref (str) – Reference allele. Specify which allele the dosage matrix will count. It can be either "a1" (default) or "a0".

KennethEnevoldsen avatar Apr 11 '22 14:04 KennethEnevoldsen