HLA_analyses_tutorial Tutorial

Hello,

According to your tutorial, this command below cat 1KGp3v5.miss.frq.diff.hwe.freq | sort -k1,1 | join - <(cat 1KGp3v5.bim | awk '{print $2, $1 "_" $4}' | sort -k1,1) | awk '{print $5, $2, $3, $4}' | sort -k1,1 > 1KGp3v5.Ref.Frq.chr_pos_allele

Could you please explain what would you like to do? It is because I tried to run this command, but it produced very huge file (>1TB). I am not sure whether it is the correct command.

Thank you in advance, Apiwat

Feb 22 '23 19:02 asangphukieo

I think it is because the 1KGp file that I dowload has no RS ID. I download from http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr6.phase3_shapeit2_mvncall_integrated_v5b.20130502.genotypes.vcf.gz

Could you please provide the link to download 1KGp vcf file?

Best,

Feb 22 '23 20:02 asangphukieo

Hello,

I double checked and the current code does not have any issue generating these files. The 1KG file can be downloaded from the official FTP site, with 20130502 release named as "ALL.chr6.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz". This file should contain rsIDs in the ID column. You can also download it from UCSC https://hgdownload-euro.soe.ucsc.edu/gbdb/hg19/1000Genomes/phase3/

The 1KGp3v5.miss.frq.diff.hwe.freq file should look like: rs544586840 G T 0.000199681 rs561313667 T C 0.0666933 rs530120680 G A 0.0706869 rs540888038 G A 0.00159744 rs531186114 C CT 0.00139776 rs560786506 A G 0.000599042 rs532713529 T C 0.000399361 rs552527233 A G 0.000199681 rs571306991 T C 0.00159744 rs548119795 C CAG 0.0119808 ...

and 1KGp3v5.Ref.Frq.chr_pos_allele should look like: 6_10000005 C T 0.2498 6_10000008 C A 0.000199681 6_100000105 C T 0.000199681 6_100000134 G A 0.0501198 6_100000143 T C 0.000199681 ...

Thanks, Saori

Aug 09 '24 21:08 saorisakaue

HLA_analyses_tutorial HLA_analyses_tutorial copied to clipboard

Tutorial

HLA_analyses_tutorial
HLA_analyses_tutorial copied to clipboard