python-bibtexparser icon indicating copy to clipboard operation
python-bibtexparser copied to clipboard

Warn if parser called multiple times

Open MiWeiss opened this issue 3 years ago • 17 comments

Consider the snippet shown below. The second parses call returns 4 entries, I would have expected three.

I do see use cases where this is intended, and changing this could break some dependent applications. Hence, I propose to raise a warning if the parse function is called more than once, with some way to disable the warning.

(issue identified as part of #247)

import bibtexparser

from bibtexparser.bparser import BibTexParser

parser = BibTexParser()

a = """
@article{Cesar2013,
  author = {Jean César},
  title = {An amazing title},
  year = {2013},
  month = "jan",
  volume = {12},
  pages = {12-23},
  journal = {Nice Journal},
  abstract = {This is an abstract. This line should be long enough to test
multilines... and with a french érudit word},
  comments = {A comment},
  keyword = {keyword1, keyword2},
}
"""
a_database = bibtexparser.loads(a, parser)

print(len(a_database.entries))

b = """
@article{FuMetalhalideperovskite2019,
    author = {Yongping Fu and Haiming Zhu and Jie Chen and Matthew P. Hautzinger and X.-Y. Zhu and Song Jin},
    doi = {10.1038/s41578-019-0080-9},
    journal = {Nature Reviews Materials},
    month = {feb},
    number = {3},
    pages = {169-188},
    publisher = {Springer Science and Business Media {LLC}},
    title = {Metal halide perovskite nanostructures for optoelectronic applications and the study of physical properties},
    url = {https://www.nature.com/articles/s41578-019-0080-9},
    volume = {4},
    year = {2019}
}

@article{SunEnablingSiliconSolar2014,
    author = {Ke Sun and Shaohua Shen and Yongqi Liang and Paul E. Burrows and Samuel S. Mao and Deli Wang},
    doi = {10.1021/cr300459q},
    journal = {Chemical Reviews},
    month = {aug},
    number = {17},
    pages = {8662-8719},
    publisher = {American Chemical Society ({ACS})},
    title = {Enabling Silicon for Solar-Fuel Production},
    url = {http://pubs.acs.org/doi/10.1021/cr300459q},
    volume = {114},
    year = {2014}
}

@article{LiuPhotocatalytichydrogenproduction2016,
    author = {Maochang Liu and Yubin Chen and Jinzhan Su and Jinwen Shi and Xixi Wang and Liejin Guo},
    doi = {10.1038/nenergy.2016.151},
    impactfactor = {54.000},
    journal = {Nature Energy},
    month = {sep},
    number = {11},
    pages = {16151},
    publisher = {Springer Science and Business Media {LLC}},
    title = {Photocatalytic hydrogen production using twinned nanocrystals and an unanchored {NiSx} co-catalyst},
    url = {http://www.nature.com/articles/nenergy2016151},
    volume = {1},
    year = {2016}
}
"""
b_database = bibtexparser.loads(b, parser)

MiWeiss avatar Jul 10 '22 10:07 MiWeiss

Hi @MiWeiss, May I work on this?

I assume you want to finish the process with exit code 0 instead of assertion error and exit code 1. Am I right?

echomehran avatar Jul 10 '22 12:07 echomehran

Hi @mrn97

Thanks! Yes, a contribution would be very welcome.

I'm afraid we cannot change the functionality due to backwards compatibility (see comment above). Instead, I propose to raise a warning warnings.warn if the parse function is called more than once, with some way to disable the warning.

For clarification, I removed the assertion in the snippet above.

The "some way to disable the warning" would probably have to be a flag on the parser, except if you have another idea.

MiWeiss avatar Jul 10 '22 12:07 MiWeiss

Dear @MiWeiss, thanks for the opportunity. I thought you would want something like this len(b_database.entries) == 3, warnings.warn(f"Expected 3 entries, got {len(b_database.entries)}") instead of assertion error, but you want a warning as a flag on the parser. right?

echomehran avatar Jul 10 '22 12:07 echomehran

No, the original assertion was there just to highlight the problem in the snippet ;-)

I imagine the following:

  • The parser should call warnings.warn if parse is being called multiple times.
  • This warning can be disabled by setting parser.expect_multiple_parse = True (such flag has to be added)

MiWeiss avatar Jul 10 '22 13:07 MiWeiss

Thanks for the hints I think the following would solve the problem.

bparser.py

def __init__(self, data=None,
                 customization=None,
                 ignore_nonstandard_types=True,
                 homogenize_fields=False,
                 interpolate_strings=True,
                 common_strings=True,
                 add_missing_from_crossref=False,
                 expect_multiple_parse=False):

        self.expect_multiple_parse = expect_multiple_parse
        if not self.expect_multiple_parse:
            warnings.warn("Parser has been called more than once.")

parser = BibTexParser(common_strings=False, expect_multiple_parse=True)

and I think we need to do that if the class is initialized more than once like using something like __class__.count.right?

echomehran avatar Jul 10 '22 14:07 echomehran

The warning should only be raised if the actual parse function is called more than once

MiWeiss avatar Jul 10 '22 14:07 MiWeiss

I think I can say this is what you want but, counting the number of method calls is a little bit challenging and I had to add another method which I don't know if that's ok with you or not.

    @staticmethod
    def count_calls(func):
        name = func.__name__

        @wraps(func)
        def wrapper(self, *args, **kwargs):
            counter = getattr(self, "_calls_counter", None)
            if counter is None:
                counter = self._calls_counter = defaultdict(int)
            counter[name] += 1
            return func(self, *args, **kwargs)

        wrapper._is_count_call_wrapper = True
        return wrapper

    def parse(self, bibtex_str, partial=False, expect_multiple_parse=False):
        """Parse a BibTeX string into an object

        :param expect_multiple_parse: if True, does not print warnings
        :param bibtex_str: BibTeX string
        :type: str or unicode
        :param partial: If True, print errors only on parsing failures.
            If False, an exception is raised.
        :type: boolean
        :return: bibliographic database
        :rtype: BibDatabase
        """

        call_counter = BibTexParser.count_calls(parse)
        if call_counter > 1 and not expect_multiple_parse:
            warnings.warn("Parser has been called more than once.")

parser = BibTexParser(common_strings=False, expect_multiple_parse=True)

echomehran avatar Jul 10 '22 14:07 echomehran

Why not just a self._parse_call_count field?

MiWeiss avatar Jul 10 '22 14:07 MiWeiss

No disrespect - how experienced are you as a programmer?

This fix is quite trivial, and I am not sure if we are getting there...

MiWeiss avatar Jul 10 '22 14:07 MiWeiss

That's ok, I'm not that experienced, and I'm trying to send my first PR to an open-source project, I think I'm almost done, but if you think I'm wasting your time then that's your choice.

echomehran avatar Jul 10 '22 14:07 echomehran

If a self._parse_call_count field works I think we are done and I can send a PR.

echomehran avatar Jul 10 '22 14:07 echomehran

I see - well, let's try a bit further to get you to your first accepted PR =)

Open the PR and then I'll give you more review comments right there...

MiWeiss avatar Jul 10 '22 14:07 MiWeiss

Dear @MiWeiss, I've sent a PR, please feel free to ask for any changes.

echomehran avatar Jul 10 '22 15:07 echomehran

I see - well, let's try a bit further to get you to your first accepted PR =)

Open the PR and then I'll give you more review comments right there...

Dear @MiWeiss Thanks for your time. and, actually it's not my first PR, it's my first PR for an open source project and I appreciate for your time.

echomehran avatar Jul 10 '22 15:07 echomehran

Your PR has failing tests, amongst others as it uses as param on the parse method (when it should be a field)...Please make sure you get the tests to pass, then the review will follow.

MiWeiss avatar Jul 10 '22 15:07 MiWeiss

I just sent another PR.

I've also run tests and they were ok.

echomehran avatar Jul 10 '22 16:07 echomehran

Dear @MiWeiss, all tests have been passed successfully please give me review comments any time.

echomehran avatar Jul 11 '22 09:07 echomehran