python-bibtexparser
python-bibtexparser copied to clipboard
Warn if parser called multiple times
Consider the snippet shown below. The second parses call returns 4 entries, I would have expected three.
I do see use cases where this is intended, and changing this could break some dependent applications. Hence, I propose to raise a warning if the parse function is called more than once, with some way to disable the warning.
(issue identified as part of #247)
import bibtexparser
from bibtexparser.bparser import BibTexParser
parser = BibTexParser()
a = """
@article{Cesar2013,
author = {Jean César},
title = {An amazing title},
year = {2013},
month = "jan",
volume = {12},
pages = {12-23},
journal = {Nice Journal},
abstract = {This is an abstract. This line should be long enough to test
multilines... and with a french érudit word},
comments = {A comment},
keyword = {keyword1, keyword2},
}
"""
a_database = bibtexparser.loads(a, parser)
print(len(a_database.entries))
b = """
@article{FuMetalhalideperovskite2019,
author = {Yongping Fu and Haiming Zhu and Jie Chen and Matthew P. Hautzinger and X.-Y. Zhu and Song Jin},
doi = {10.1038/s41578-019-0080-9},
journal = {Nature Reviews Materials},
month = {feb},
number = {3},
pages = {169-188},
publisher = {Springer Science and Business Media {LLC}},
title = {Metal halide perovskite nanostructures for optoelectronic applications and the study of physical properties},
url = {https://www.nature.com/articles/s41578-019-0080-9},
volume = {4},
year = {2019}
}
@article{SunEnablingSiliconSolar2014,
author = {Ke Sun and Shaohua Shen and Yongqi Liang and Paul E. Burrows and Samuel S. Mao and Deli Wang},
doi = {10.1021/cr300459q},
journal = {Chemical Reviews},
month = {aug},
number = {17},
pages = {8662-8719},
publisher = {American Chemical Society ({ACS})},
title = {Enabling Silicon for Solar-Fuel Production},
url = {http://pubs.acs.org/doi/10.1021/cr300459q},
volume = {114},
year = {2014}
}
@article{LiuPhotocatalytichydrogenproduction2016,
author = {Maochang Liu and Yubin Chen and Jinzhan Su and Jinwen Shi and Xixi Wang and Liejin Guo},
doi = {10.1038/nenergy.2016.151},
impactfactor = {54.000},
journal = {Nature Energy},
month = {sep},
number = {11},
pages = {16151},
publisher = {Springer Science and Business Media {LLC}},
title = {Photocatalytic hydrogen production using twinned nanocrystals and an unanchored {NiSx} co-catalyst},
url = {http://www.nature.com/articles/nenergy2016151},
volume = {1},
year = {2016}
}
"""
b_database = bibtexparser.loads(b, parser)
Hi @MiWeiss, May I work on this?
I assume you want to finish the process with exit code 0 instead of assertion error and exit code 1. Am I right?
Hi @mrn97
Thanks! Yes, a contribution would be very welcome.
I'm afraid we cannot change the functionality due to backwards compatibility (see comment above). Instead, I propose to raise a warning warnings.warn if the parse function is called more than once, with some way to disable the warning.
For clarification, I removed the assertion in the snippet above.
The "some way to disable the warning" would probably have to be a flag on the parser, except if you have another idea.
Dear @MiWeiss, thanks for the opportunity. I thought you would want something like this len(b_database.entries) == 3, warnings.warn(f"Expected 3 entries, got {len(b_database.entries)}") instead of assertion error, but you want a warning as a flag on the parser. right?
No, the original assertion was there just to highlight the problem in the snippet ;-)
I imagine the following:
- The parser should call
warnings.warnif parse is being called multiple times. - This warning can be disabled by setting
parser.expect_multiple_parse = True(such flag has to be added)
Thanks for the hints I think the following would solve the problem.
bparser.py
def __init__(self, data=None,
customization=None,
ignore_nonstandard_types=True,
homogenize_fields=False,
interpolate_strings=True,
common_strings=True,
add_missing_from_crossref=False,
expect_multiple_parse=False):
self.expect_multiple_parse = expect_multiple_parse
if not self.expect_multiple_parse:
warnings.warn("Parser has been called more than once.")
parser = BibTexParser(common_strings=False, expect_multiple_parse=True)
and I think we need to do that if the class is initialized more than once like using something like __class__.count.right?
The warning should only be raised if the actual parse function is called more than once
I think I can say this is what you want but, counting the number of method calls is a little bit challenging and I had to add another method which I don't know if that's ok with you or not.
@staticmethod
def count_calls(func):
name = func.__name__
@wraps(func)
def wrapper(self, *args, **kwargs):
counter = getattr(self, "_calls_counter", None)
if counter is None:
counter = self._calls_counter = defaultdict(int)
counter[name] += 1
return func(self, *args, **kwargs)
wrapper._is_count_call_wrapper = True
return wrapper
def parse(self, bibtex_str, partial=False, expect_multiple_parse=False):
"""Parse a BibTeX string into an object
:param expect_multiple_parse: if True, does not print warnings
:param bibtex_str: BibTeX string
:type: str or unicode
:param partial: If True, print errors only on parsing failures.
If False, an exception is raised.
:type: boolean
:return: bibliographic database
:rtype: BibDatabase
"""
call_counter = BibTexParser.count_calls(parse)
if call_counter > 1 and not expect_multiple_parse:
warnings.warn("Parser has been called more than once.")
parser = BibTexParser(common_strings=False, expect_multiple_parse=True)
Why not just a self._parse_call_count field?
No disrespect - how experienced are you as a programmer?
This fix is quite trivial, and I am not sure if we are getting there...
That's ok, I'm not that experienced, and I'm trying to send my first PR to an open-source project, I think I'm almost done, but if you think I'm wasting your time then that's your choice.
If a self._parse_call_count field works I think we are done and I can send a PR.
I see - well, let's try a bit further to get you to your first accepted PR =)
Open the PR and then I'll give you more review comments right there...
Dear @MiWeiss, I've sent a PR, please feel free to ask for any changes.
I see - well, let's try a bit further to get you to your first accepted PR =)
Open the PR and then I'll give you more review comments right there...
Dear @MiWeiss Thanks for your time. and, actually it's not my first PR, it's my first PR for an open source project and I appreciate for your time.
Your PR has failing tests, amongst others as it uses as param on the parse method (when it should be a field)...Please make sure you get the tests to pass, then the review will follow.
I just sent another PR.
I've also run tests and they were ok.
Dear @MiWeiss, all tests have been passed successfully please give me review comments any time.