python-bibtexparser
python-bibtexparser copied to clipboard
Non-Entries/Comments are not interpreted correctly
According to btxdoc, Section 4, item 7, p. 13
BibTeX allows in the database files any comment that's not within an entry. If you want to comment out an entry, simply remove the
@character preceding the entry type.
However when you parse the following database
@ARTICLE{bla,
author={Fartsy},
title={Title},
}
year=nonsense,
year gets added to the entry like
{u'}\n\nyear': u'nonsense,', 'ID': 'bla', ... }
Additionally, although incompatible with bibtex, biber allows the LaTeX comment symbol % to be used, e.g.
@ARTICLE{bla,
author={Fartsy},
%title={Title},
}
however bibtexparser simply ignores the character and creates a field %title from it.
Original issue is solved thanks to #64.
Not sure whether adding % support is a good idea. What do you think @sciunto?
I think we should probably support that better. It means we have to consider
@ARTICLE{bla,
author={Fartsy},
% title={Title},
}
as well.
The minimum is to add a note to the documentation.
A partial support would be to make sure that the example above creates a field %title or ignore the field. Whatever the behaviour we choose, it should be documented.
Right now having something like %title={Title}, or % title={Title}, raises an error on parsing.
Supporting the former is quite trivial by extending characters allowed in field names (replace bibtexexpression.py#L145 by field_name = pp.Word(pp.alphanums + '_-()%')('FieldName') but it does not cover the latter. We could alternatively easily enable commented fields if this is desired but what to do with the following?
abstract = {A multi-line value with a
% in the
middle of it},
%abstract = {A multi-line
commented field},
Yes, supporting comments can really quickly lead to such issues :/ I'll try to have a deeper look at what Biber exactly does.
Relevant parts for v2 extracted to its own issue: #372