python-bibtexparser
python-bibtexparser copied to clipboard
Normalize field keys (to lowercase)
Describe the bug I have several .bib files that contain (mixed) field keys that are either in lowercase or start with a capital letter, such as "Author" and "Title". No other tooling complains about this.
SeparateCoAuthors does not work and I cannot uniformy access the fields using e.g. entry['title']
A normalization to lowercase of the field keys was conducted in v1.
Maybe this can be fixed using a middleware? I would be really grateful!
Reproducing
Version: e3757c13abf2784bda612464843ab30256317e6c
Code:
#!/usr/bin/python
import bibtexparser
import bibtexparser.middlewares as m
layers = [
m.LatexDecodingMiddleware(),
m.MonthIntMiddleware(True), # Months should be represented as int (0-12)
m.SeparateCoAuthors(True), # Co-authors should be separated as list of strings
m.SplitNameParts(True), # Individual Names should be split into first, von, last, jr parts
m.MergeNameParts("last", True) # Individual Names should be merged oto Last, First...
]
bib_database = bibtexparser.parse_file('data/Survey.bib', append_middleware=layers)
for entry in bib_database.entries:
print(entry['title']);
Bibtex:
@InCollection{Name2006,
Title = {A Title},
Author = {Name, First and Name, Second},
Booktitle = {ITS},
Publisher = {Some publisher},
Year = {2006},
Pages = {61--70}
}
Remaining Questions (Optional) Please tick all that apply:
- [ ] I would be willing to contribute a PR to fix this issue.
- [ ] This issue is a blocker, I'd be grateful for an early fix.
Thanks!
- [ ] We should add a middleware that normalizes field names.
- [ ] We could consider a default lower-case mapping.
Maybe something like this (Works For Me™)?
import bibtexparser
from bibtexparser.library import Library
from bibtexparser.model import Block, Entry
class NormalizeFieldNames(bibtexparser.middlewares.middleware.BlockMiddleware):
def __init__(self,
allow_inplace_modification: bool = True):
super().__init__(allow_inplace_modification=allow_inplace_modification,
allow_parallel_execution=True)
def transform_entry(self, entry: Entry, library: "Library") -> Union[Block, Collection[Block], None]:
for field in entry.fields:
field.key = field.key.lower()
return entry
Usage example:
library = bibtexparser.parse_file(filename,
append_middleware=[NormalizeFieldNames(),
bibtexparser.middlewares.SeparateCoAuthors(),
bibtexparser.middlewares.SplitNameParts()])
That's probably alright. Would you be willing to convert it to a PR (adding a test)? I think this is a quite common use-case that we should support.
Fully agree with @tdegeus, and would appreciate a PR by @Technologicat
Just one remark: We'd have to be able to handle "new" duplicates somehow (i.e., if two field keys exist in the original block which only differ in their capitalization). That's particularly important now that we're pushing the use of entries as dicts. In principle, we have an entry type DuplicateFieldKeyBlock
that should be used here, but I am also happy to support additional suggestions. These would probably have to be enabled with a corresponding constructor parameter (e.g. raising an exception). Does this make sense?
@tdegeus: Sure.
@MiWeiss: Good point about conflicting keys. But I'll need a bit more information about the desired way to tackle it.
The way this approximately went is, yesterday I got a sudden need to extract some data from BibTeX in Python.
Within an hour, I had installed bibtexparser
, upgraded it to 2.x, ran into this issue (since my datafiles happened to use capitalized keys), written the simplest possible field key normalizer, and posted a copy here. So it's fair to say I'm kind of new to this project :)
A solution would be to issue a warning (similar to library.failed_blocks) and use the last key value.
@csware: Thanks. Yes, that's one possible solution, and probably the simplest one that works.
~Considering alternatives, what about the DuplicateFieldKeyBlock
mentioned by @MiWeiss?~ EDIT: Nevermind, I think I understood what you all meant now.
Implemented, using @csware's suggestion of emitting a warning and letting the last value win. Please review.