varianttools icon indicating copy to clipboard operation
varianttools copied to clipboard

KING pipeline does not work

Open BoPeng opened this issue 5 years ago • 5 comments

#141

I updated my vtools to v3.1.2, python 3.7.6, King 2.2.4

As I see in this disussion, @gaow commented that KING is updated. But I'm getting the error exactly as mentioned in that ISSUE.

I checked the changes in KING made by @gaow through this link - it shows .txt output in step KING_41 as opposed to .ped in an older version. But my error still shows, .ped file does not exist. I have updated my software using bioconda, so I am not sure whether it has been updated there or not. Can that be causing this error? Meanwhile, the issue with EXPORT command remain the same in my latest version. Could you please help with it? (I tried exporting to .tped as well without any luck)

BoPeng avatar May 16 '20 17:05 BoPeng

I have trouble installing plink because the bioconda plink uses a version of gsl that conflicts with the conda-forge version of gsl used by vtools.

BoPeng avatar May 16 '20 18:05 BoPeng

vtools execute KING   --jobname dummy  --var_table pass_variants   --king_path ~/bin/   --plink_path ~/bin

INFO: Executing KING.king_0: Load specified snapshot if a snapshot is specified. Otherwise use the existing project.
INFO: Executing KING.king_10: Check the existence of KING and PLINK command.
INFO: Command /Users/bpeng/bin//king is located.
INFO: Command /Users/bpeng/bin/plink is located.
INFO: Executing KING.king_20: Write selected variant and samples in tped format
INFO: Running vtools export pass_variants --format tped --samples "1" | awk '{$2=$1"_"$4;$3=0;print $0}' > /Users/bpeng/vatlab/vtools/ticket147/.vtools_cache/dummy.tped
INFO: Executing KING.king_21: Rename tfam file to match tped file
INFO: Running mv pass_variants.tfam /Users/bpeng/vatlab/vtools/ticket147/.vtools_cache/dummy.tfam
INFO: Executing KING.king_30: Calculate LD pruning candidate list with a cutoff of R^2=0.5
INFO: Running /Users/bpeng/bin/plink --tped dummy.tped --tfam dummy.tfam --indep-pairwise 50 5 0.5 --allow-no-sex --out dummy.LD.50 under /Users/bpeng/vatlab/vtools/ticket147/.vtools_cache
INFO: Executing KING.king_31: LD pruning from pre-calculated list
INFO: Running /Users/bpeng/bin/plink --tped dummy.tped --tfam dummy.tfam --extract dummy.LD.50.prune.in --no-parents --no-sex --no-pheno --maf 0.01 --make-bed --out dummy under /Users/bpeng/vatlab/vtools/ticket147/.vtools_cache
INFO: Executing KING.king_41: Global ancestry inference
INFO: Running /Users/bpeng/bin//king -b dummy.bed --mds --prefix dummy- under /Users/bpeng/vatlab/vtools/ticket147/.vtools_cache
ERROR: Failed to execute step king_41: Output file /Users/bpeng/vatlab/vtools/ticket147/.vtools_cache/dummy-pc.txt does not exist after completion of the job.
king
KING 2.2.4 - (c) 2010-2019 Wei-Min Chen

 plink

@----------------------------------------------------------@
|        PLINK!       |     v1.07      |   10/Aug/2009     |
|----------------------------------------------------------|
|  (C) 2009 Shaun Purcell, GNU General Public License, v2  |
|----------------------------------------------------------|
|  For documentation, citation & bug-report instructions:  |
|        http://pngu.mgh.harvard.edu/purcell/plink/        |
@----------------------------------------------------------@

So the problem seems to be with the version of king.

BoPeng avatar May 16 '20 19:05 BoPeng

Error message is

Genotypes stored in 1 words for each of 26 individuals.
The number of individuals is < 1.

so the previous command plink --make-bed could be doing something wrong.

I would suggest that @enigmargs tries to understand what ~/.variant_tools/pipeline/KING.pipeline is trying to achieve and see if this is what is supposed to happen given this particular dataset.

BoPeng avatar May 16 '20 19:05 BoPeng

I tried exececuting steps individually on,

  • different (larger) dataset - which shows the same error as you quoted

  • King v2.2.3 which shows the following 223PNG

  • and king v2.2.2 which shows error as below, I think in line with the WARNING mentioned in KING website that LPACK libraries are lacking below v2.2.3 222

Is there any specific reason to use tfile format in export (instead of vcf) at King_20? Is it advisable to calculate PCs using PLINK and then import as phenotype field directly?

I sincerely hope that I'm not dragging it too long!

enigmargs avatar May 22 '20 10:05 enigmargs

@enigmargs This pipeline not only does PCA but also does relationship analysis which is important in GWAS. Compared to PLINK, the relationship analysis using KING is more robust to the presence of population structure, and can perform pair-wise comparisons between individuals thus works for small sample size where good estimate of allele frequency is challenging.

Perhaps for PCA/MDS analysis there is no major difference between these tools. Unfortunately we don't have a separate implementation for that.

It seems the failure is related to LAPACK installation?

gaow avatar May 22 '20 12:05 gaow