pandas-plink
pandas-plink copied to clipboard
Chromosome names for X, Y and MT?
Sorry if I'm doing something wrong, but when I use plink2 ... --recode vcf
I get chromosomes called 21
, 22
, X
, Y
and even MT
... However, using read_plink(files)
, they are encoded as 21, 22, 23, 24 and 25.
I know this encoding is expected: https://www.cog-genomics.org/plink/1.9/input
Given diploid autosomes, the remaining modifiers let you indicate the absence of specific non-autosomal chromosomes, as an extra sanity check on the input data. Note that, when there are n autosome pairs, the X chromosome is assigned numeric code n+1, Y is n+2, XY (pseudo-autosomal region of X) is n+3, and MT (mitochondria) is n+4.
However, is there a way to 'fix it' in the output like recode vcf
does?
I don't see anything in the documentation about this...
I'm currently writing files out as:
...
# Find the SNVs
p = bim.a0.str.len() == 1
q = bim.a1.str.len() == 1
snv = bim[p & q]
print("SNVs:", snv.shape)
snv.to_csv("sensible_name.tsv", sep="\t", columns=["chrom", "pos", "snp", "a0", "a1"], index=False)
So trying to avoid going in and messing with the DataFrame the array line by line...