PyVCF
PyVCF copied to clipboard
FILTER line is malformed
Background:
In FILTER, multiple filters should be separated by semicolons. The widely used, but not actively maintained, VarScan2 genomic variant caller uses commas instead. Moreover, VarScan2 does not add ##FILTER metadata for most of its filters. Picard FixVcfHeader can be used to fix missing FILTER metadata. A "fixed" metadata row will look like:
##FILTER=<ID="RefAvgRL,VarAvgRL",Description="Missing description: this FILTER line was added by Picard's FixVCFHeader">
Error:
PyVCF fails with:
`
Traceback (most recent call last):
File "/mnt/hdd/dnanexus/scripts_local/compare_vcfs.py", line 236, in
File "/mnt/hdd/dnanexus/scripts_local/compare_vcfs.py", line 232, in main run(parser.parse_args())
File "/mnt/hdd/dnanexus/scripts_local/compare_vcfs.py", line 166, in run df_1 = vcf_to_dataframe(args.vcf_1)
File "/mnt/hdd/dnanexus/scripts_local/compare_vcfs.py", line 74, in vcf_to_dataframe vcf_reader = vcf.Reader(open(vcf_file, "r"))
File "/home/myourshaw/.venv/dnanexus/lib/python3.10/site-packages/vcf/parser.py", line 300, in init self._parse_metainfo()
File "/home/myourshaw/.venv/dnanexus/lib/python3.10/site-packages/vcf/parser.py", line 326, in _parse_metainfo key, val = parser.read_filter(line)
File "/home/myourshaw/.venv/dnanexus/lib/python3.10/site-packages/vcf/parser.py", line 142, in read_filter raise SyntaxError(
SyntaxError: One of the FILTER lines is malformed: ##FILTER=<ID="RefAvgRL,VarAvgRL",Description="Missing description: this FILTER line was added by Picard's FixVCFHeader"> `
Issue:
It might be more robust for PyVCF to treat a filter with commas as just one big filter name, as does Picard FixVcfHeader.
Instead of raising an exception, accept metadata with a filter ID inside double quotes and containing commas, e.g., ID="RefAvgRL,VarAvgRL"
.
Similarly, in the data, treat a FILTER value like RefAvgRL,VarAvgRL
as a single entity. I think this solution is consistent with the VCF 4.2 spec for a filter name: String, no whitespace or semicolons permitted
.
Possible pull request:
This hack (changing [^,] +
to .+
worked to get me through an urgent analysis, but it may not be the best solution. At parser.py line 142
self.filter_pattern = re.compile(r'''\#\#FILTER=< ID=(?P<id>.+),\s* Description="(?P<desc>[^"]*)" >''', re.VERBOSE)
I get the same problem, any update on this issue ?
I hoped switching to PyVCF3 (c.f. #335 ) would solve the issue but apparently not.
My bad, in my case the problem originated from a tag Source
in a FILTER
field:
##FILTER=<ID=xxx,Description="yyy",Source="zzz">
which is a INFO
field tag according to https://samtools.github.io/hts-specs/ and not a FILTER
field tag.
Please comment this issue on pyvcf3 https://github.com/dridk/PyVCF3/issues/1