could not parse the input VCF
Hi, I want to call SNPs for PacBio HiFi assembled haplotype genome using the following command: bcftools mpileup -f hg38.fa --threads 16 -O v $file | bcftools call --ploidy 1 -mv -o ../SNP/$vcf
The following information was reported: [mpileup] maximum number of reads per input file set to -d 250 [W::vcf_parse_info] INFO 'MQ0' is not defined in the header, assuming Type=String Error: could not parse the input VCF
[mpileup] maximum number of reads per input file set to -d 250 [W::vcf_parse] Contig 'chrUn_KI270' is not defined in the header. (Quick workaround: index the file with tabix.) Error: could not parse the input VCF
[mpileup] maximum number of reads per input file set to -d 250 [W::vcf_parse_filter] FILTER '' is not defined in the header Error: could not parse the input VCF
The contigs were mapped to hg38 using minimap2: minimap2 -t 12 -ax asm5 hg38.fa ./assemble/$file > ./AlignedSam/$out
I tried to restart some failure programe and recived complete SNP files.
That's very odd. What version of bcftools are you running, what is the output of
bcftools --version
Hi, Thank you for your reply, The bcftools I use is 1.17. I changed the command line: bcftools mpileup -f hg38.fa --threads 15 -o ../SNP/$nvcf -O v $file bcftools call -c -v ../SNP/$nvcf --ploidy 1 -o ../SNP/
And I got a new error massage: Wrong number of PL fields? nals=0 npl=-3
I rechecked the vcf files and found that the vcf files were still incomplete.
I also tried bcftools 1.15, samtools 1.15 and bcftools 1.20, samtools 1.20, but got similiar error massage and truncated vcf files.
Still very odd. In order to debug this, can you run
bcftools mpileup -f hg38.fa -o test.vcf -Ov $file
and share the output, possibly narrowing it down to a few problematic sites? Also it would be good to do this with the latest version 1.21.
If the data is sensitive, please use my email address (on my profile page)
Hi, I restart bcftools with different RAM, I find that incomplete output files were due to small RAM, after I enlarge the RAM (~150G) the output can be finished properly.
Wow, 150GB? Usually the program works with very small memory requirements. I wonder what are your data like that require so much memory..
I call SNPs for PacBio HiFi assembled contigs of human (~3G). I try to call SNPs for drosophila assembled contig recently, but similar error occured, the vcf file was trucated again.
Hi, I call SNP for 145 PacBio HiFi assembled human genomes using the following command: bcftools mpileup -Ou -f hg38.fa -b BamList.txt -r chr1:1-2000000 | bcftools call -cv --ploidy 1 -o out.call.vcf I find that the programe always stuck at the same position even I run this on different platform with different RAM or change the version of bcftools (1.22, 1.17, 1.10, 1.8). I also got a lot of trucated vcf files for other chromosomes. I have prepared a little test bam .tar.gz about 90M, I will send this file if you need it. Any help will be appreciated.
You may wish to try bcftools mpileup -X pacbio-ccs to select options optimised for HiFi data. There's variability of course between organisms and sequencing runs, but I would expect it would be preferable to the error model produced by HiFi than the default Illumina one. One key thing this also does is to disable BAQ, which can be very slow, especially with long reads.
This doesn't explain truncated outputs though. That should never happen unless it's simply running out of memory or crashing for some reason.
Also, test data is always useful for reproducing problems. Thanks