sarek icon indicating copy to clipboard operation
sarek copied to clipboard

vcftools bug interrupts execution under --joint_germline

Open jmgs7 opened this issue 3 months ago • 1 comments

Description of the bug

Hello there!

I just came to notice that the vcftools TsTv counts step breaks the execution of the pipeline with the C error messages I attach. Despite outputting this error, it still generates the TsTv counts file, but interrupts the execution leaving the pipeline unfinished. I have tested to run it individually on a different environment and the output is the same. Seems an error from vcftools, but I just wanted to notify you so you know about it.

I also attach the profile file, the nextflow command I used to run the pipeline, the whole run log and the generated file by the command.

Thank you very much for your attention, best regards.

Command used and terminal output

#!/usr/bin/env bash -C -e -u -o pipefail
vcftools \
    --gzvcf joint_germline_recalibrated.vcf.gz \
    --out joint_germline_recalibrated \
    --TsTv-by-count \
     \


cat <<-END_VERSIONS > versions.yml
"NFCORE_SAREK:SAREK:VCF_QC_BCFTOOLS_VCFTOOLS:VCFTOOLS_TSTV_COUNT":
    vcftools: $(echo $(vcftools --version 2>&1) | sed 's/^.*VCFtools (//;s/).*//')
END_VERSIONS

--------- Terminal output -----------

VCFtools - 0.1.16
(C) Adam Auton and Anthony Marcketta 2009

Parameters as interpreted:
	--gzvcf joint_germline_recalibrated.vcf.gz
	--out joint_germline_recalibrated
	--TsTv-by-count

Using zlib version: 1.2.11
Warning: Expected at least 2 parts in FORMAT entry: ID=PGT,Number=1,Type=String,Description="Physical phasing haplotype information, describing how the alternate alleles are phased in relation to one another; will always be heterozygous and is not intended to describe called alleles">
Warning: Expected at least 2 parts in FORMAT entry: ID=PID,Number=1,Type=String,Description="Physical phasing ID information, where each unique ID within a given sample (but not across samples) connects records within a phasing group">
Warning: Expected at least 2 parts in FORMAT entry: ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">
Warning: Expected at least 2 parts in FORMAT entry: ID=RGQ,Number=1,Type=Integer,Description="Unconditional reference genotype confidence, encoded as a phred quality -10*log10 p(genotype call is wrong)">
Warning: Expected at least 2 parts in INFO entry: ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes, for each ALT allele, in the same order as listed">
Warning: Expected at least 2 parts in INFO entry: ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes, for each ALT allele, in the same order as listed">
Warning: Expected at least 2 parts in INFO entry: ID=AF,Number=A,Type=Float,Description="Allele Frequency, for each ALT allele, in the same order as listed">
Warning: Expected at least 2 parts in INFO entry: ID=AF,Number=A,Type=Float,Description="Allele Frequency, for each ALT allele, in the same order as listed">
Warning: Expected at least 2 parts in INFO entry: ID=MLEAC,Number=A,Type=Integer,Description="Maximum likelihood expectation (MLE) for the allele counts (not necessarily the same as the AC), for each ALT allele, in the same order as listed">
Warning: Expected at least 2 parts in INFO entry: ID=MLEAC,Number=A,Type=Integer,Description="Maximum likelihood expectation (MLE) for the allele counts (not necessarily the same as the AC), for each ALT allele, in the same order as listed">
Warning: Expected at least 2 parts in INFO entry: ID=MLEAF,Number=A,Type=Float,Description="Maximum likelihood expectation (MLE) for the allele frequency (not necessarily the same as the AF), for each ALT allele, in the same order as listed">
Warning: Expected at least 2 parts in INFO entry: ID=MLEAF,Number=A,Type=Float,Description="Maximum likelihood expectation (MLE) for the allele frequency (not necessarily the same as the AF), for each ALT allele, in the same order as listed">
Warning: Expected at least 2 parts in INFO entry: ID=culprit,Number=1,Type=String,Description="The annotation which was the worst performing in the Gaussian mixture model, likely the reason why the variant was filtered out">
After filtering, kept 3 out of 3 Individuals
Outputting Ts/Tv by Alternative Allele Count
/home/jose.gomez/Data/Genomics/WES-test/work/9d/5e8126b2f53d1614484be110c1b15e/.command.sh: line 7:    34 Segmentation fault      (core dumped) vcftools --gzvcf joint_germline_recalibrated.vcf.gz --out joint_germline_recalibrated --TsTv-by-count

Relevant files

joint_germline_recalibrated.TsTv.count.txt

netxflow.log

nf-params.json

System information

Nextflow v25.04.7 (conda installation) on Ubuntu Server 24.04.03. sarek 3.5.1

jmgs7 avatar Sep 30 '25 11:09 jmgs7

I have experienced the same issue.

I managed to work around it by simply ignoring the error since, as you said, the output is actually correct.

Here is the custom config I'm using:

process {
  withName:VCFTOOLS_TSTV_COUNT {  
    errorStrategy = 'ignore'
  }
}

DennisSchwartz avatar Oct 23 '25 11:10 DennisSchwartz