xqtl-protocol icon indicating copy to clipboard operation
xqtl-protocol copied to clipboard

Update on the dosage genotype to accommodate imputation genotype

Open hsun3163 opened this issue 2 years ago • 7 comments

  1. pgen can store and output dosage from the imputed genotypes

  2. our existed pipeline should be able to works on pgen/bim/fam file sets, which leads to the minimum amount of change. (test pending)

  3. Hacking the function tensorQTL used, pandas-plink.read_plink() has proven to be too difficult.

  4. The pgenlib, which supposedly be the official plink2 reader, are 4.1 in version 0.81 and still under development 4.2 has rather poor documentation: https://github.com/BertrandServin/pgenlib and https://github.com/chrchang/plink-ng/blob/master/2.0/Python/python_api.txt. but I somehow managed to extract the dosage for all samples from 1 snp (the function provided can only work on 1 snp so we needs to loop through each of the 800000+ samples). 4.3 The plan is to read and fill the genotype matrix on a bed (can be handled by pgenlib as if it is bed) file and compare it with the one generated by tensorQTL to verify we do it right.

  5. If 4 fails, as it turns out tensorQTL also has a function to take vcf.gz as genotype, which requires to the transformation of plink per chrom into a vcf format. The behavior of these functions were not explored yet

hsun3163 avatar Aug 31 '22 03:08 hsun3163