vcftools bug interrupts execution under --joint_germline
Description of the bug
Hello there!
I just came to notice that the vcftools TsTv counts step breaks the execution of the pipeline with the C error messages I attach. Despite outputting this error, it still generates the TsTv counts file, but interrupts the execution leaving the pipeline unfinished. I have tested to run it individually on a different environment and the output is the same. Seems an error from vcftools, but I just wanted to notify you so you know about it.
I also attach the profile file, the nextflow command I used to run the pipeline, the whole run log and the generated file by the command.
Thank you very much for your attention, best regards.
Command used and terminal output
#!/usr/bin/env bash -C -e -u -o pipefail
vcftools \
--gzvcf joint_germline_recalibrated.vcf.gz \
--out joint_germline_recalibrated \
--TsTv-by-count \
\
cat <<-END_VERSIONS > versions.yml
"NFCORE_SAREK:SAREK:VCF_QC_BCFTOOLS_VCFTOOLS:VCFTOOLS_TSTV_COUNT":
vcftools: $(echo $(vcftools --version 2>&1) | sed 's/^.*VCFtools (//;s/).*//')
END_VERSIONS
--------- Terminal output -----------
VCFtools - 0.1.16
(C) Adam Auton and Anthony Marcketta 2009
Parameters as interpreted:
--gzvcf joint_germline_recalibrated.vcf.gz
--out joint_germline_recalibrated
--TsTv-by-count
Using zlib version: 1.2.11
Warning: Expected at least 2 parts in FORMAT entry: ID=PGT,Number=1,Type=String,Description="Physical phasing haplotype information, describing how the alternate alleles are phased in relation to one another; will always be heterozygous and is not intended to describe called alleles">
Warning: Expected at least 2 parts in FORMAT entry: ID=PID,Number=1,Type=String,Description="Physical phasing ID information, where each unique ID within a given sample (but not across samples) connects records within a phasing group">
Warning: Expected at least 2 parts in FORMAT entry: ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">
Warning: Expected at least 2 parts in FORMAT entry: ID=RGQ,Number=1,Type=Integer,Description="Unconditional reference genotype confidence, encoded as a phred quality -10*log10 p(genotype call is wrong)">
Warning: Expected at least 2 parts in INFO entry: ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes, for each ALT allele, in the same order as listed">
Warning: Expected at least 2 parts in INFO entry: ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes, for each ALT allele, in the same order as listed">
Warning: Expected at least 2 parts in INFO entry: ID=AF,Number=A,Type=Float,Description="Allele Frequency, for each ALT allele, in the same order as listed">
Warning: Expected at least 2 parts in INFO entry: ID=AF,Number=A,Type=Float,Description="Allele Frequency, for each ALT allele, in the same order as listed">
Warning: Expected at least 2 parts in INFO entry: ID=MLEAC,Number=A,Type=Integer,Description="Maximum likelihood expectation (MLE) for the allele counts (not necessarily the same as the AC), for each ALT allele, in the same order as listed">
Warning: Expected at least 2 parts in INFO entry: ID=MLEAC,Number=A,Type=Integer,Description="Maximum likelihood expectation (MLE) for the allele counts (not necessarily the same as the AC), for each ALT allele, in the same order as listed">
Warning: Expected at least 2 parts in INFO entry: ID=MLEAF,Number=A,Type=Float,Description="Maximum likelihood expectation (MLE) for the allele frequency (not necessarily the same as the AF), for each ALT allele, in the same order as listed">
Warning: Expected at least 2 parts in INFO entry: ID=MLEAF,Number=A,Type=Float,Description="Maximum likelihood expectation (MLE) for the allele frequency (not necessarily the same as the AF), for each ALT allele, in the same order as listed">
Warning: Expected at least 2 parts in INFO entry: ID=culprit,Number=1,Type=String,Description="The annotation which was the worst performing in the Gaussian mixture model, likely the reason why the variant was filtered out">
After filtering, kept 3 out of 3 Individuals
Outputting Ts/Tv by Alternative Allele Count
/home/jose.gomez/Data/Genomics/WES-test/work/9d/5e8126b2f53d1614484be110c1b15e/.command.sh: line 7: 34 Segmentation fault (core dumped) vcftools --gzvcf joint_germline_recalibrated.vcf.gz --out joint_germline_recalibrated --TsTv-by-count
Relevant files
joint_germline_recalibrated.TsTv.count.txt
System information
Nextflow v25.04.7 (conda installation) on Ubuntu Server 24.04.03. sarek 3.5.1
I have experienced the same issue.
I managed to work around it by simply ignoring the error since, as you said, the output is actually correct.
Here is the custom config I'm using:
process {
withName:VCFTOOLS_TSTV_COUNT {
errorStrategy = 'ignore'
}
}