bcftools icon indicating copy to clipboard operation
bcftools copied to clipboard

could not parse the input VCF

Open BrendaLee1 opened this issue 8 months ago • 8 comments

Hi, I want to call SNPs for PacBio HiFi assembled haplotype genome using the following command: bcftools mpileup -f hg38.fa --threads 16 -O v $file | bcftools call --ploidy 1 -mv -o ../SNP/$vcf

The following information was reported: [mpileup] maximum number of reads per input file set to -d 250 [W::vcf_parse_info] INFO 'MQ0' is not defined in the header, assuming Type=String Error: could not parse the input VCF

[mpileup] maximum number of reads per input file set to -d 250 [W::vcf_parse] Contig 'chrUn_KI270' is not defined in the header. (Quick workaround: index the file with tabix.) Error: could not parse the input VCF

[mpileup] maximum number of reads per input file set to -d 250 [W::vcf_parse_filter] FILTER '' is not defined in the header Error: could not parse the input VCF

The contigs were mapped to hg38 using minimap2: minimap2 -t 12 -ax asm5 hg38.fa ./assemble/$file > ./AlignedSam/$out

I tried to restart some failure programe and recived complete SNP files.

BrendaLee1 avatar Apr 29 '25 04:04 BrendaLee1

That's very odd. What version of bcftools are you running, what is the output of

bcftools --version

pd3 avatar Apr 30 '25 15:04 pd3

Hi, Thank you for your reply, The bcftools I use is 1.17. I changed the command line: bcftools mpileup -f hg38.fa --threads 15 -o ../SNP/$nvcf -O v $file bcftools call -c -v ../SNP/$nvcf --ploidy 1 -o ../SNP/

And I got a new error massage: Wrong number of PL fields? nals=0 npl=-3

I rechecked the vcf files and found that the vcf files were still incomplete.

I also tried bcftools 1.15, samtools 1.15 and bcftools 1.20, samtools 1.20, but got similiar error massage and truncated vcf files.

BrendaLee1 avatar May 01 '25 09:05 BrendaLee1

Still very odd. In order to debug this, can you run

bcftools mpileup -f hg38.fa -o test.vcf -Ov $file

and share the output, possibly narrowing it down to a few problematic sites? Also it would be good to do this with the latest version 1.21.

If the data is sensitive, please use my email address (on my profile page)

pd3 avatar May 05 '25 14:05 pd3

Hi, I restart bcftools with different RAM, I find that incomplete output files were due to small RAM, after I enlarge the RAM (~150G) the output can be finished properly.

BrendaLee1 avatar May 12 '25 05:05 BrendaLee1

Wow, 150GB? Usually the program works with very small memory requirements. I wonder what are your data like that require so much memory..

pd3 avatar May 12 '25 13:05 pd3

I call SNPs for PacBio HiFi assembled contigs of human (~3G). I try to call SNPs for drosophila assembled contig recently, but similar error occured, the vcf file was trucated again.

BrendaLee1 avatar May 17 '25 03:05 BrendaLee1

Hi, I call SNP for 145 PacBio HiFi assembled human genomes using the following command: bcftools mpileup -Ou -f hg38.fa -b BamList.txt -r chr1:1-2000000 | bcftools call -cv --ploidy 1 -o out.call.vcf I find that the programe always stuck at the same position even I run this on different platform with different RAM or change the version of bcftools (1.22, 1.17, 1.10, 1.8). I also got a lot of trucated vcf files for other chromosomes. I have prepared a little test bam .tar.gz about 90M, I will send this file if you need it. Any help will be appreciated.

BrendaLee1 avatar Jul 10 '25 06:07 BrendaLee1

You may wish to try bcftools mpileup -X pacbio-ccs to select options optimised for HiFi data. There's variability of course between organisms and sequencing runs, but I would expect it would be preferable to the error model produced by HiFi than the default Illumina one. One key thing this also does is to disable BAQ, which can be very slow, especially with long reads.

This doesn't explain truncated outputs though. That should never happen unless it's simply running out of memory or crashing for some reason.

Also, test data is always useful for reproducing problems. Thanks

jkbonfield avatar Jul 10 '25 08:07 jkbonfield