gwas2vcf icon indicating copy to clipboard operation
gwas2vcf copied to clipboard

benchmarks and speedups

Open darked89 opened this issue 3 years ago • 8 comments

Hello,

I have completed TSV to VCF transformation of one Finngen GWAS summary file as a test case. The gwas2vcf was run using Singularity container using:

  • finngen_R5_AB1_AMOEBIASIS.tsv (16380388 lines)
  • dbSNP 155
  • GRCh38 fasta
  • Singularity 3.7.0
  • Intel(R) Xeon(R) Gold 6146 CPU @ 3.20GHz

Both genomic fasta and dbSNP VCF had chromosome ids in the same 1-22,X,Y,MT format, were indexed etc.

the output VCF format:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  amoeb01
1       108391  rs1274919517    A       G       .       PASS    .       ES:SE:LP:ID     0.3653:1.9411:0.0702236:rs1274919517

This was executed in 582.99 mins
usr time 577.51 mins sys time 5.38 mins

My questions

  1. is this in line with running times you observed or 10hrs /file is a fluke (== too slow)?
  2. would it be possible to speed up processing?

Thank you

Darek Kedra

darked89 avatar Jul 02 '21 08:07 darked89