parse_it icon indicating copy to clipboard operation
parse_it copied to clipboard

potential slowness when dealing with large number of configuration values

Open ThomasWaldmann opened this issue 6 years ago • 8 comments

the way the code does it right now is that it does everything per configuration key:

  • a full argparse
  • checking env vars (no big issue, this is quite fast as it is a dict already)
  • a full config file read / parse

if you have a lot of configuration keys, this will get slow.

ThomasWaldmann avatar May 09 '19 22:05 ThomasWaldmann

as argparse already has a quite extensive infrastructure, quite some projects will have a more or less big argparse parser setup already (including help strings, typing, defaults, etc.).

maybe you could start from doing argparse just once and then finding out what attributes it has and then checking the env vars / config based on that (also only once for all).

ThomasWaldmann avatar May 09 '19 22:05 ThomasWaldmann

see there for a monster: https://github.com/borgbackup/borg/blob/1.2.0a6/src/borg/archiver.py#L2465

ThomasWaldmann avatar May 09 '19 22:05 ThomasWaldmann

Not sure scalability is the right phrasing for the issue you raise as it suggest that after X amount of configuration keys parse_it will cease working... I agree with your assessment that the more configuration keys needed the longer it will take so gonna change the title to something more descriptive of the problem

naorlivne avatar May 12 '19 12:05 naorlivne

It should also be noted that unless you're dealing with thousands of configuration values it should still be a matter of a fraction of a second to go over them all (or so at least the tests I've ran so far seem to suggest)... that been said there is always room for improvement, as well as the argparse pre-reading suggestion @ThomasWaldmann raised I think it may be wise to run some speed tests to find what part of the process is the slowest and focus on that as without it it's just guess works.

naorlivne avatar May 12 '19 13:05 naorlivne

besides speed, it is also an architectural thing.

maybe it's just not an elegant way to do it like that. it just works by brute force somehow. :-)

ThomasWaldmann avatar May 12 '19 16:05 ThomasWaldmann

The slowest part seems to be when forcing an envvars to be upper-case, the following unit test takes ~114ms on my laptop:

    def test_parser_read_configuration_variable_force_envvars_uppercase_true(self):
        parser = ParseIt(force_envvars_uppercase=True)
        os.environ["TEST_ENVVAR_ESTIMATE_TRUE_INT"] = "123"
        os.environ["test_envvar_estimate_true_int"] = "456"
        reply = parser.read_configuration_variable("test_envvar_estimate_true_int")
        self.assertEqual(reply, 123)
        self.assertNotEqual(reply, 456)

where as the unit test of reading variables without them being forced to be upper-case takes <1ms on my laptop:

    def test_parser_read_configuration_variable_force_envvars_uppercase_false(self):
        parser = ParseIt(force_envvars_uppercase=False, config_folder_location=test_files_location)
        os.environ["TEST_ENVVAR_ESTIMATE_TRUE_INT"] = "123"
        os.environ["test_envvar_estimate_true_int"] = "456"
        reply = parser.read_configuration_variable("test_envvar_estimate_true_int")
        self.assertNotEqual(reply, 123)
        self.assertEqual(reply, 456)

All other tests (cli args included) takes considerably less time then test_parser_read_configuration_variable_force_envvars_uppercase_true so thinking that should be the focus of speed improvements as it will likely have the biggest pay off.

naorlivne avatar May 13 '19 13:05 naorlivne

uppercasing some string is a quick operation. but you need to be careful that the tests are really the same other than that (hint: they are not) and that normal timing fluctuations on you multitasking machine might be way bigger than what you are measuring.

ThomasWaldmann avatar May 13 '19 22:05 ThomasWaldmann

Your right about the config_folder_location diffrence being the root cause of the slowness there, if not set it is using the current working directory as returned from os.getcwd() which is apparently takes >100ms, luckily as this is only a one time request when setting the ParseIt object it will not be called for every configuration value being read.

naorlivne avatar May 14 '19 07:05 naorlivne