cpython icon indicating copy to clipboard operation
cpython copied to clipboard

gh-96859: [argparse] Avoid O(N^2) behavior while consuming optionals

Open alexeypa opened this issue 3 years ago • 11 comments
trafficstars

The loop consuming optional and positional arguments in ArgumentParser._parse_known_args() rescans option_string_indices on every iteration of the loop in order to compute the next option index:

next_option_string_index = min([
    index
    for index in option_string_indices
    if index >= start_index])

This becomes very slow when parsing a very long list of arguments, e.g. when parsing a response file.

This commit refactors the loop to scan the option indices list only once.

  • Issue: gh-96859

alexeypa avatar Sep 18 '22 01:09 alexeypa

Most changes to Python require a NEWS entry.

Please add it using the blurb_it web app or the blurb command-line tool.

bedevere-bot avatar Sep 18 '22 01:09 bedevere-bot

All commit authors signed the Contributor License Agreement.
CLA signed

cpython-cla-bot[bot] avatar Sep 18 '22 01:09 cpython-cla-bot[bot]

@bitdancer, please share your reviews.

kumiDa avatar Sep 18 '22 08:09 kumiDa

Tests / Windows (x86) (pull_request) Failing after 31m https://github.com/python/cpython/actions/runs/3091293898/jobs/5001192381

This does not seem related to the changes on the PR. Also the latest commit only touched comments.

alexeypa avatar Sep 20 '22 15:09 alexeypa

Tests / Windows (x86) (pull_request) Failing after 31m https://github.com/python/cpython/actions/runs/3091293898/jobs/5001192381

This does not seem related to the changes on the PR. Also the latest commit only touched comments.

Try closing and reopening the PR to get the check to re-run.

AlexWaygood avatar Sep 20 '22 16:09 AlexWaygood

BTW, what is the etiquette around commits addressing review feedback that are not really meaningful changes by themselves? Should I squash them before the PR is merged?

Please don't! All CPython PRs are squash-merged anyway, so you don't need to; and PR authors squashing commits together makes it harder for reviewers to see what changed between commits due to the GitHub UI :)

AlexWaygood avatar Sep 20 '22 16:09 AlexWaygood

A friendly ping to bring some attention to this PR.

@rhettinger, would you be the best person to do a core review? You listed as the maintainer of argparse.py at the top of the file. Please let me know if I should ping someone else.

alexeypa avatar Oct 22 '22 18:10 alexeypa

I used to contribute a lot to argparse as paul.j3, but haven't followed it much since the github move. So this will be a general comment.

Backward compatibility has been a major issue; I can think of several changes that either had to be removed, or have remained and continue to cause problem. Thus I favor bug fixes, and feature enhancements that have little chance of causing problems. I discourage core changes that aren't essential.

This looks like a core change simply for the sake of performance. I'd vote against it, unless it has been thoroughly tested, not just read through by one or two people. I don't think there are enough argparse developers to do that well.

Most users don't use argparse for large inputs. If performance came up on StackOverFlow, I'd tell them to read the input(s) from a file, and reserve argparse for short lists - file names and controls, not data.

hpaulj avatar Nov 04 '22 16:11 hpaulj

Thanks for your feedback, @hpaulj. The change is indeed fixes a performance issue and attempts to not modify the existing behavior in any way. The test_argparse.py suite looks pretty comprehensive and tests a number is of cases involving append actions and a mix of optional/positional arguments. However it does not test the mix of argparse and 3rd party apps using it - it is not an appcompat test suite. I don't believe I can provide any evidence showing that this change does not introduce breakage with in a large body of 3rd party code. If this means this change does not meet the bar - so be it.

If performance came up on StackOverFlow, I'd tell them to read the input(s) from a file, and reserve argparse for short lists - file names and controls, not data.

The specific case this issue came up in was parsing a response file generated by a build system. It was a reasonable legitimate case of passing diverse command line parameters via a file due to size constraints. I'd agree this is a corner case.

alexeypa avatar Nov 07 '22 06:11 alexeypa

From a quick read of your changes it looks like you are replacing the set, option_string_indices with list option_tuples and option_tuple_index.

Years ago I added parse_intermixed_args. I created a whole new pair of functions rather than add some sort of parameter and alternative parsing path in the default parse_args. You might consider something similar with a parse_faster_args. At least until we get more feedback, leave the current parser as the default, but provide interested users with an alternative.

A down side to that is it won't get as much beta testing. But I don't have a good feel for how many problems get resolved with Python beta releases.

hpaulj avatar Nov 07 '22 16:11 hpaulj

You might consider something similar with a parse_faster_args

Hm, this is an interesting suggestion...

_parse_known_args() is 200+ lines long. It would not be wise just clone it and apply this patch to the new version. We can probably refactor _parse_known_args() to make it more manageable - move sub-functions out, maybe break it up into smaller functions. It should be possible to do it and demonstrate that all these changes are just moving existing code around.

After this refactoring re-implementing the core parsing loop and exposing it as a new top-level parsing function should be relatively simple. Having two parallel functions also allow cross-checking one against the other.

I'm not sure this plan has lower risk of introducing a backward compatibility issue. The refactoring alone is going to be pretty substantial in terms of the size of the patch...

alexeypa avatar Nov 09 '22 07:11 alexeypa

Closing this PR due to lack of interest.

alexeypa avatar Jan 23 '23 01:01 alexeypa