23andme2vcf
23andme2vcf copied to clipboard
hg18 reference
hi rob,
i tried using your script for converting the 23andme data from the personal genome project. it appears that data is still aligned to hg18. could you proivde the corresponding reference or let me know how to compile it myself (like, do you have a script that derives it from the ncbi fasta files?)
thanks tim
for know i worked around the problem via liftOver (see below), but i still get ~30k site that were not included. it might be better to get a proper reference from you, if you don't mind adding the support for hg18. what i did:
unzip ref files
for f in *_ref.txt.gz; do gunzip $f; done
convert to bed format (loosing allele info!)
for f in *_ref.txt; do awk 'BEGIN{FS="\t";OFS="\t"}{print $1,$2,$2+1,$3}' $f >${f/.txt/.bed}; done
liftOver to hg18
for f in *.bed; do ../liftOver/liftOver $f ../liftOver/hg19ToHg18.over.chain.gz ${f/hg19/hg18} ${f/.bed/.unmapped.bed}; done
re-convert to txt format and add allele info (which is assumed not to change) from hg19
for f in *_hg18_ref.bed; do awk 'BEGIN{FS="\t";OFS="\t"}{print $1,$2,$4}' $f | paste - <(cut -f4 ${f/hg18_ref.bed/hg19_ref.txt}) >${f/.bed/.txt}; done
re-zip ref files
for f in *_ref.txt; do gzip -9 $f; done