typed-argument-parser icon indicating copy to clipboard operation
typed-argument-parser copied to clipboard

TAP works much slower than standard ArgumentParser

Open Enolerobotti opened this issue 4 years ago • 4 comments

TAP works very slow: to reproduce

from tap import Tap
from argparse import ArgumentParser
from profilehooks import profile, timecall


class SimpleArgumentParser(Tap):
    name: str  # Your name
    language: str = 'Python'  # Programming language
    package: str = 'Tap'  # Package name
    stars: int  # Number of stars
    max_stars: int = 5  # Maximum stars


@profile(dirs=True, sort='time')
@timecall()
def tap_parser():
    return SimpleArgumentParser().parse_args()


@profile(dirs=True, sort='time')
@timecall()
def arg_parser():
    parser = ArgumentParser()
    parser.add_argument('--name', type=str, required=True,
                        help='Your name')
    parser.add_argument('--language', type=str, default='Python',
                        help='Programming language')
    parser.add_argument('--package', type=str, default='Tap',
                        help='Package name')
    parser.add_argument('--stars', type=int, required=True,
                        help='Number of stars')
    parser.add_argument('--max_stars', type=int, default=5,
                        help='Maximum stars')
    return parser.parse_args()


tap_args = tap_parser()
args = arg_parser()

print(f'My name is {tap_args.name} and I give the {tap_args.language} package '
      f'{tap_args.package} {tap_args.stars}/{tap_args.max_stars} stars!')
print(f'My name is {args.name} and I give the {args.language} package '
      f'{args.package} {args.stars}/{args.max_stars} stars!')

tap_parser: 0.050 seconds (tap 1.6.1) tap_parser: 0.042 seconds (tap 1.4.3) arg_parser: 0.000 seconds

The most time consuming actions are ncalls tottime percall cumtime percall filename:lineno(function) 7754 0.013 0.000 0.032 0.000 /home/artem/.conda/envs/rdclone/lib/python3.6/tokenize.py:492(_tokenize) 7148 0.006 0.000 0.006 0.000 {method 'match' of '_sre.SRE_Pattern' objects}

I assume these two involve regular expressions.

Environment Ubuntu 20.04.2 LTS Python 3.6.10 conda 4.9.2 profilehooks 1.12.0

Enolerobotti avatar Feb 20 '21 11:02 Enolerobotti

Hi @Enolerobotti,

Thank you for bringing this to our attention! Yes, the slow part is the source code inspection in order to extract the help strings from comments. In our machine learning workflows, we find this overhead to be acceptable, but we can imagine that certain applications may need a faster argument parser. In the future, we will add a flag to disable this. Concretely, we would skip this call: https://github.com/swansonk14/typed-argument-parser/blob/0dc69b36dfd54c3a5cb07daa92f7f0f680437064/tap/tap.py#L93

Best, Kyle and Jesse

martinjm97 avatar Feb 20 '21 19:02 martinjm97

Hi @martinjm97,

Thank you for your reply! Indeed, the _get_class_variables method works slow. The problem is that when I try to use ChemProp in my workflow, I need some default arguments (which are defined inside a TAP instance) to be loaded several times. I have to initiate TAP based classes several times due to an assertion (btw, for me it looks redundant). So, for ensemble of 10 models, I spend additional 0.5 sec per molecule to predict. For list longer than 1000 molecules it would be worth for me to get rid of redundant operations.

Cheers, Artem

Enolerobotti avatar Feb 22 '21 15:02 Enolerobotti

@Enolerobotti,

That makes sense! I didn't think a Tap would be in an inner loop. We'll add the flag soon and then you can save the 500s for your 1000 molecules! Thanks for letting us know and for the clear explanation.

--Jesse

martinjm97 avatar Feb 22 '21 16:02 martinjm97

Encountering the same issue, but now it seems like the class variables are being used on line 72 in _add_argument:

            # Description
            if variable in self.class_variables:
                kwargs['help'] += ' ' + self.class_variables[variable]['comment']

and on line 298 in _add_arguments:

        # Add class variables (in order)
        for variable in self.class_variables:

As that variable is now also being used for ordering, is it now not possible to make this change as noted above?

jackdewinter avatar Jul 01 '23 17:07 jackdewinter