HLA_analyses_tutorial
HLA_analyses_tutorial copied to clipboard
Tutorial
Hello,
According to your tutorial, this command below cat 1KGp3v5.miss.frq.diff.hwe.freq | sort -k1,1 | join - <(cat 1KGp3v5.bim | awk '{print $2, $1 "_" $4}' | sort -k1,1) | awk '{print $5, $2, $3, $4}' | sort -k1,1 > 1KGp3v5.Ref.Frq.chr_pos_allele
Could you please explain what would you like to do? It is because I tried to run this command, but it produced very huge file (>1TB). I am not sure whether it is the correct command.
Thank you in advance, Apiwat
I think it is because the 1KGp file that I dowload has no RS ID. I download from http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr6.phase3_shapeit2_mvncall_integrated_v5b.20130502.genotypes.vcf.gz
Could you please provide the link to download 1KGp vcf file?
Best,
Hello,
I double checked and the current code does not have any issue generating these files. The 1KG file can be downloaded from the official FTP site, with 20130502 release named as "ALL.chr6.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz". This file should contain rsIDs in the ID column. You can also download it from UCSC https://hgdownload-euro.soe.ucsc.edu/gbdb/hg19/1000Genomes/phase3/
The 1KGp3v5.miss.frq.diff.hwe.freq
file should look like:
rs544586840 G T 0.000199681
rs561313667 T C 0.0666933
rs530120680 G A 0.0706869
rs540888038 G A 0.00159744
rs531186114 C CT 0.00139776
rs560786506 A G 0.000599042
rs532713529 T C 0.000399361
rs552527233 A G 0.000199681
rs571306991 T C 0.00159744
rs548119795 C CAG 0.0119808
...
and 1KGp3v5.Ref.Frq.chr_pos_allele
should look like:
6_10000005 C T 0.2498
6_10000008 C A 0.000199681
6_100000105 C T 0.000199681
6_100000134 G A 0.0501198
6_100000143 T C 0.000199681
...
Thanks, Saori