xqtl-protocol
xqtl-protocol copied to clipboard
Update on the dosage genotype to accommodate imputation genotype
-
pgen can store and output dosage from the imputed genotypes
-
our existed pipeline should be able to works on pgen/bim/fam file sets, which leads to the minimum amount of change. (test pending)
-
Hacking the function tensorQTL used,
pandas-plink.read_plink()
has proven to be too difficult. -
The pgenlib, which supposedly be the official plink2 reader, are 4.1 in version 0.81 and still under development 4.2 has rather poor documentation: https://github.com/BertrandServin/pgenlib and https://github.com/chrchang/plink-ng/blob/master/2.0/Python/python_api.txt. but I somehow managed to extract the dosage for all samples from 1 snp (the function provided can only work on 1 snp so we needs to loop through each of the 800000+ samples). 4.3 The plan is to read and fill the genotype matrix on a bed (can be handled by pgenlib as if it is bed) file and compare it with the one generated by tensorQTL to verify we do it right.
-
If 4 fails, as it turns out tensorQTL also has a function to take vcf.gz as genotype, which requires to the transformation of plink per chrom into a vcf format. The behavior of these functions were not explored yet