PyVCF icon indicating copy to clipboard operation
PyVCF copied to clipboard

Some problem with vcf.Writer

Open kpsimonlin opened this issue 7 years ago • 0 comments

I am using this package to extract certain records from the original vcf file. Recently I extracted synonymous sites and nonsynonymous sites from multiple vcf files. The code I was using is something like this:

vcf_reader = vcf.Reader(open('my_file.vcf.gz', 'r'))
vcf_writer = vcf.Writer(open('output.vcf', 'w'), vcf_reader)
for record in vcf_reader:
    if record.INFO['ExonicFunc.refGene'] == ['synonymous_SNV']:
        vcf_writer.write_record(record)

I noticed that the code generated some output files that can't be tabixed, one of the example error is

[E::hts_idx_push] Unsorted positions on sequence #28: 112840 followed by 1
tbx_index_build failed: the_output_file.vcf.gz

Therefore, I done some test on the vcf.Writer and found something strange. I tested the writer function by following code.

vcf_reader = vcf.Reader(open('my_file.vcf.gz', 'r'))
vcf_writer = vcf.Writer(open('output.vcf', 'w'), vcf_reader)
for record in vcf_reader.fetch('chrM', 0, 20000):
    print record.CHROM
    print record.POS
    vcf_writer.write_record(record)

The output of the code is like:

chrM
150
...
chrM
16173
chrM
16225
chrM
16268
chrM
16322

Then I checked the output.vcf file and found that the vcf.Writer didn't write all the record. Here is the last row of the output.vcf file.

chrM    16225   .       T       C       8073.77 .       AC=2;AF=1.0;AN=2;BaseQRankSum=1.676;DP=250;Dels=0.0;FS=0.0;HaplotypeScore=49.1099;MLEAC=2;MLEAF=1.0;MQ=59.84;MQ0=0;MQRankSum=1.704;QD=32.3;ReadPosRankSum=-0.842;AAChange.refGene=.;ALLELE_END;ANNOVAR_DATE=2014-07-22;ExonicFunc.refGene=.;Func.refGene=intergenic;Gene.refGene=NONE,NONE;GeneDetail.refGene=dist\x3eNONE\x3bdist\x3eNONE;LJB23_FATHMM_pred=.;LJB23_FATHMM_score=.;LJB23_FATHMM_score_converted=.;LJB23_GERP++=.;LJB23_LRT_pred=.;LJB23_LRT_score=.;LJB23_LRT_score_converted=.;LJB23_LR_pred=.;LJB23_LR_score=.;LJB23_MutationAsse

It seems that vcf.Writer ignores the last two record (and didn't completely write the record at position 16225), although it did enter the for loop.

Anyone has any idea how to solve this error?

kpsimonlin avatar Oct 02 '17 08:10 kpsimonlin