gwas2vcf
gwas2vcf copied to clipboard
benchmarks and speedups
Hello,
I have completed TSV to VCF transformation of one Finngen GWAS summary file as a test case. The gwas2vcf was run using Singularity container using:
- finngen_R5_AB1_AMOEBIASIS.tsv (16380388 lines)
- dbSNP 155
- GRCh38 fasta
- Singularity 3.7.0
- Intel(R) Xeon(R) Gold 6146 CPU @ 3.20GHz
Both genomic fasta and dbSNP VCF had chromosome ids in the same 1-22,X,Y,MT format, were indexed etc.
the output VCF format:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT amoeb01
1 108391 rs1274919517 A G . PASS . ES:SE:LP:ID 0.3653:1.9411:0.0702236:rs1274919517
This was executed in 582.99 mins
usr time 577.51 mins sys time 5.38 mins
My questions
- is this in line with running times you observed or 10hrs /file is a fluke (== too slow)?
- would it be possible to speed up processing?
Thank you
Darek Kedra