hg18 reference

Open tikacp opened this issue 7 years ago • 1 comments

hi rob,

i tried using your script for converting the 23andme data from the personal genome project. it appears that data is still aligned to hg18. could you proivde the corresponding reference or let me know how to compile it myself (like, do you have a script that derives it from the ncbi fasta files?)

thanks tim

Feb 19 '18 14:02 tikacp

for know i worked around the problem via liftOver (see below), but i still get ~30k site that were not included. it might be better to get a proper reference from you, if you don't mind adding the support for hg18. what i did:

unzip ref files

for f in *_ref.txt.gz; do gunzip $f; done

convert to bed format (loosing allele info!)

for f in *_ref.txt; do awk 'BEGIN{FS="\t";OFS="\t"}{print $1,$2,$2+1,$3}' $f >${f/.txt/.bed}; done

liftOver to hg18

for f in *.bed; do ../liftOver/liftOver $f ../liftOver/hg19ToHg18.over.chain.gz ${f/hg19/hg18} ${f/.bed/.unmapped.bed}; done

re-convert to txt format and add allele info (which is assumed not to change) from hg19

for f in *_hg18_ref.bed; do awk 'BEGIN{FS="\t";OFS="\t"}{print $1,$2,$4}' $f | paste - <(cut -f4 ${f/hg18_ref.bed/hg19_ref.txt}) >${f/.bed/.txt}; done

re-zip ref files

for f in *_ref.txt; do gzip -9 $f; done

Feb 19 '18 14:02 tikacp

23andme2vcf 23andme2vcf copied to clipboard

hg18 reference

unzip ref files

convert to bed format (loosing allele info!)

liftOver to hg18

re-convert to txt format and add allele info (which is assumed not to change) from hg19

re-zip ref files

23andme2vcf
23andme2vcf copied to clipboard