23andme2vcf icon indicating copy to clipboard operation
23andme2vcf copied to clipboard

hg18 reference

Open tikacp opened this issue 7 years ago • 1 comments

hi rob,

i tried using your script for converting the 23andme data from the personal genome project. it appears that data is still aligned to hg18. could you proivde the corresponding reference or let me know how to compile it myself (like, do you have a script that derives it from the ncbi fasta files?)

thanks tim

tikacp avatar Feb 19 '18 14:02 tikacp

for know i worked around the problem via liftOver (see below), but i still get ~30k site that were not included. it might be better to get a proper reference from you, if you don't mind adding the support for hg18. what i did:

unzip ref files

for f in *_ref.txt.gz; do gunzip $f; done

convert to bed format (loosing allele info!)

for f in *_ref.txt; do awk 'BEGIN{FS="\t";OFS="\t"}{print $1,$2,$2+1,$3}' $f >${f/.txt/.bed}; done

liftOver to hg18

for f in *.bed; do ../liftOver/liftOver $f ../liftOver/hg19ToHg18.over.chain.gz ${f/hg19/hg18} ${f/.bed/.unmapped.bed}; done

re-convert to txt format and add allele info (which is assumed not to change) from hg19

for f in *_hg18_ref.bed; do awk 'BEGIN{FS="\t";OFS="\t"}{print $1,$2,$4}' $f | paste - <(cut -f4 ${f/hg18_ref.bed/hg19_ref.txt}) >${f/.bed/.txt}; done

re-zip ref files

for f in *_ref.txt; do gzip -9 $f; done

tikacp avatar Feb 19 '18 14:02 tikacp