resholve Add parser for `flatpak` and `flatpak update`

This PR only implements a parser for flatpak and one of its subcommands. I think that #90 would have to be dealt with before a parser that supports all of flatpak’s subcommands and arguments could be implemented.

I’m submitting this as a draft because always crashes at the moment:

Traceback (most recent call last):
  File "/nix/store/r910kd7i50jhfv48r4w2ak6g112vjmj9-resholve-0.9.0/bin/.resholve-wrapped", line 4807, in <module>
    sys.exit(punshow())
  File "/nix/store/r910kd7i50jhfv48r4w2ak6g112vjmj9-resholve-0.9.0/bin/.resholve-wrapped", line 947, in punshow
    resolve_cmdlikes()
  File "/nix/store/r910kd7i50jhfv48r4w2ak6g112vjmj9-resholve-0.9.0/bin/.resholve-wrapped", line 3175, in resolve_cmdlikes
    cmdlike.resolve()
  File "/nix/store/r910kd7i50jhfv48r4w2ak6g112vjmj9-resholve-0.9.0/bin/.resholve-wrapped", line 3147, in resolve
    self._resolve_invocations(solution)
  File "/nix/store/r910kd7i50jhfv48r4w2ak6g112vjmj9-resholve-0.9.0/bin/.resholve-wrapped", line 3106, in _resolve_invocations
    source.look_for_external_sub_execution(self.name, invocation)
  File "/nix/store/r910kd7i50jhfv48r4w2ak6g112vjmj9-resholve-0.9.0/bin/.resholve-wrapped", line 4076, in look_for_external_sub_execution
    if externals[subcmd]:
  File "/nix/store/r910kd7i50jhfv48r4w2ak6g112vjmj9-resholve-0.9.0/bin/.resholve-wrapped", line 1332, in __missing__
    return self.setdefault(key, self.factory(key))
  File "/nix/store/r910kd7i50jhfv48r4w2ak6g112vjmj9-resholve-0.9.0/bin/.resholve-wrapped", line 2472, in generate_external_command_parser
    return getattr(ExternalCommandParsers, cmdname)()
  File "/nix/store/r910kd7i50jhfv48r4w2ak6g112vjmj9-resholve-0.9.0/bin/.resholve-wrapped", line 2422, in _flatpak
    update = subparsers.add_parser("update")
  File "/nix/store/r3qy746fdmdx402xy6x1ymn0n8rx013a-python-2.7.18.6/lib/python2.7/argparse.py", line 1070, in add_parser
    parser = self._parser_class(**kwargs)
TypeError: __init__() got an unexpected keyword argument 'prog'

It looks like CommandParser doesn’t support all of the arguments that ArgumentParser does. Adding a subparser causes a new CommandParser to get created, and that causes the unexpected keyword argument error. I tried adding additional keyword arguments to CommandParser.__init__(), but then I ran into more errors, and I’m still not sure if that’s the right solution to this problem.

Sep 07 '23 00:09 Jayman2000

I have some WIP fixes that are making incremental progress on this, but as I'm working through it I'm thinking I should double-check my understanding.

Generally speaking, I've only been adding parsers to help resholve find the executable argument in the haystack.

I don't see one in the arguments here, and since you mention issue 90, I'm guessing this means that:

flatpak is getting correctly flagged as a likely execer
it doesn't actually have any commands that exec in the ~normal path space, just the issue-90 sort of exec?
you feel like marking it "cannot" would be a bit of a lie?
you're adding arguments not to find a command, but to try to carve some ~safe invocations out, while still letting others throw errors?

Sep 20 '23 04:09 abathur

I have some WIP fixes that are making incremental progress on this, but as I'm working through it I'm thinking I should double-check my understanding.

Generally speaking, I've only been adding parsers to help resholve find the executable argument in the haystack.

I don't see one in the arguments here, and since you mention issue 90, I'm guessing this means that:

flatpak is getting correctly flagged as a likely execer

Yep.

it doesn't actually have any commands that exec in the ~normal path space, just the issue-90 sort of exec?

I’m not sure. I know that flatpak enter executes its arguments, and I assume that it does an issue-90 sort of exec, but I haven’t actually tested it. Regardless of what flatpak enter does, it’s possible that there are other situations where flatpak executes its arguments. flatpak has 43 subcommands, and I don’t want to implement all of them just to get flatpak update to work.

you feel like marking it "cannot" would be a bit of a lie?

Yep.

you're adding arguments not to find a command, but to try to carve some ~safe invocations out, while still letting others throw errors?

Exactly. I’m also interested in submitting another pull request that does the same thing for some of Git’s subcommands.

Sep 20 '23 21:09 Jayman2000

Hehe. Fair. I'll chew on this. Thanks for the confirming.

It doesn't square with how I've been using the parsers, but I agree there's some merit.

I've avoided dealing with git for similar reasons. (I've also dragged my feet a little because I'm not happy with the parsers. And I'd like to figure out how to document these human-review decisions in a way that pins to real source and generates assertions we can run in CI that won't light up on every update but will light up when they add new arguments... some way to see when the knowledge they encode is rotting.)

At least in the face of the problem posed by some of these massive subcommand CLIs, I guess it would be nice to have a way to just dead-end an entire subcommand without having to implement arguments if we've checked it for exec and don't see any. I'll play around with argparse and see if I can figure something out.

Sep 20 '23 22:09 abathur

Curious what you think about the approach (and perhaps more importantly the right term--I suspect dead-end encodes too much of my perspective) here:

https://github.com/abathur/resholve/commit/914061c2c70d32ff057d598152b12dfffd79979b#diff-3791dcc7805ec6044570664bd7360fb4e9441c5987f7df8641b50e8bdac2df48L2421-R2436

Basically, adding support treating a command as handled if we encounter a ~special dest in argparse, and then creating a method for registering those.

Sep 21 '23 06:09 abathur

Curious what you think about the approach (and perhaps more importantly the right term--I suspect dead-end encodes too much of my perspective) here:

914061c#diff-3791dcc7805ec6044570664bd7360fb4e9441c5987f7df8641b50e8bdac2df48L2421-R2436

Basically, adding support treating a command as handled if we encounter a ~special dest in argparse, and then creating a method for registering those.

Here are my notes:

I like the term dead-end parser. It’s very short and descriptive.
I would change this:
```
  ] ++ lib.optionals stdenv.hostPlatform.isLinux [ flatpak
  ];
```
to this:
```
  ] ++ lib.optionals stdenv.hostPlatform.isLinux [ flatpak ];
```
It just looks weird to have the ]; be on a separate line if the item isn’t going to be on a separate line.

This comment is confusing:

Some patterns coalescing:
- generic handler uses the "command" dest to find subcommands
- init parser (by calling make_command_parser) with a name and variant,
  i.e. "sed" and "gnu" or "bsd"
- use a "skip" dest if there are options that are mutually-exclusive
  with sub-execution but we should keep trying alt parsers
- mark entire subcommands as ~safe (having no no exec) with:
  subparser.add_deadend_parser(name, **kwards)

The first bullet point uses the term “subcommands” to refer to arguments that gets executed (e.g., sudo hello). The last bullet point uses the term “subcommands” to refer to commands that exists inside of another command (e.g., git commit). I would change that first bullet point to

- generic handler uses the "command" dest to find arguments that get
run as commands

I would make add_deadend_parser() set add_help to False by default. Otherwise, we’re going to keep writing add_help=False every time we call add_deadend_parser().

I noticed that your commit reverted the changes from 370d6f3 (Allow parsers that don’t handle commands yet, 2023-09-06), and I’m not sure why. When I try to resolve this script,

#!/usr/bin/env bash

flatpak --version

resholve crashes:

WARNING:__main__:CommandParser CommandParser(prog='flatpak (generic)', usage=None, description=None, version=None, formatter_class=<class 'argparse.HelpFormatter'>, conflict_handler='error', add_help=False) passing instead of 'too few arguments'
Traceback (most recent call last):
  File "/nix/store/4llg3a96ihldq85x3m28f6fqnw0qcr9v-resholve-0.9.0-914061c/bin/.resholve-wrapped", line 4792, in <module>
    sys.exit(punshow())
  File "/nix/store/4llg3a96ihldq85x3m28f6fqnw0qcr9v-resholve-0.9.0-914061c/bin/.resholve-wrapped", line 947, in punshow
    resolve_cmdlikes()
  File "/nix/store/4llg3a96ihldq85x3m28f6fqnw0qcr9v-resholve-0.9.0-914061c/bin/.resholve-wrapped", line 3158, in resolve_cmdlikes
    cmdlike.resolve()
  File "/nix/store/4llg3a96ihldq85x3m28f6fqnw0qcr9v-resholve-0.9.0-914061c/bin/.resholve-wrapped", line 3130, in resolve
    self._resolve_invocations(solution)
  File "/nix/store/4llg3a96ihldq85x3m28f6fqnw0qcr9v-resholve-0.9.0-914061c/bin/.resholve-wrapped", line 3089, in _resolve_invocations
    source.look_for_external_sub_execution(self.name, invocation)
  File "/nix/store/4llg3a96ihldq85x3m28f6fqnw0qcr9v-resholve-0.9.0-914061c/bin/.resholve-wrapped", line 4090, in look_for_external_sub_execution
    if handler(parsed, invocation):
  File "/nix/store/4llg3a96ihldq85x3m28f6fqnw0qcr9v-resholve-0.9.0-914061c/bin/.resholve-wrapped", line 3892, in handle_external_generic
    for subinvoke in parsed.commands:
AttributeError: 'Namespace' object has no attribute 'commands'

I made a branch that restores the changes from 370d6f3 (Allow parsers that don’t handle commands yet, 2023-09-06), and the error went away.

Every time a flatpak command appears in a script, I get this warning:

WARNING:__main__:CommandParser CommandParser(prog='flatpak (generic)', usage=None, description=None, version=None, formatter_class=<class 'argparse.HelpFormatter'>, conflict_handler='error', add_help=False) passing instead of 'too few arguments'

Sep 22 '23 14:09 Jayman2000

I would change this...

Thanks for noticing. I intended both the item and the final bracket on their own lines, but that column in my editor was a bit too narrow and the line-wrap made it look right.

This comment is confusing... The first bullet point uses the term “subcommands” to refer to arguments that gets executed (e.g., sudo hello). The last bullet point uses the term “subcommands” to refer to commands that exists inside of another command (e.g., git commit). I would change that first bullet point to
- generic handler uses the "command" dest to find arguments that get
run as commands

Good catch. I've felt all along like my (the?) language around these executable arguments is a bit slippery. A lot of the most-common terms are very overloaded already, and then Shell helps weird things further by layering in concepts like command substitution. Most of what we care about is external executables executing other user-supplied external executables, but down at the parser layer it also has to close over builtins, some of which can execute more than just external executables.

What about:

- generic handler uses the "command" dest to find arguments that are
  themselves commands we may need to resolve

I would make add_deadend_parser() set add_help to False by default. Otherwise, we’re going to keep writing add_help=False every time we call add_deadend_parser().

Good point.

Every time a flatpak command appears in a script, I get this warning:

WARNING:__main__:CommandParser CommandParser(prog='flatpak (generic)', usage=None, description=None, version=None, formatter_class=<class 'argparse.HelpFormatter'>, conflict_handler='error', add_help=False) passing instead of 'too few arguments'

(I'm switching up the order, because knowledge of this argparse quirk will play a minor role in the last note :)

Thanks for poking at this. I had seen it, but it didn't set off my ~detector.

Looks like a known issue with subparsers: https://bugs.python.org/issue9253. TL;DR: subparser argument is effectively required, and invocations without it present cause the "too few arguments" error.

I don't want to completely discard the message since it might still be a breadcrumb when it comes to getting one of these parsers working, but I guess we can just selectively downgrade the level to debug when there's a subparser present...

I noticed that your commit reverted the changes from 370d6f3 (Allow parsers that don’t handle commands yet, 2023-09-06), and I’m not sure why.

Thanks for pointing out the failing invocation. I'm not really decided yet, so I'll just think out loud here (I'm not looking for a response, but feel free).

I patched it out to see if I can satisfy your use-case without it. It might end up being the best option, but I have some concerns about it that I need to either address or convince myself are non-issues:

It papers over an error that, at least so far, has generally meant there's a real problem:
- the command's weird enough that the parser needs its own handler but one wasn't added or the names aren't lining up
- the command is supposed to go through the generic handler, but we've forgotten to mark the command in the options, tried to do so but typoed it, etc.
It may punch too big of a hole in the existing ~safety model around finding sub-exec. Up to now, the model is effectively that we can let probable execers through as long as we have a parser we believe to be reasonably capable of finding that exec in the arguments. The parser (or a cannot override) basically stands in for a human's assertion that they've reviewed the command and ruled out or isolated the exec behavior.

Some fair fraction of the time I end up having to plot out all of the arguments (or doing so because at some point it's simpler to do/maintain 100% than 90%), but the parser can technically be very minimal as long as it's able to play spot-the-exec.

The sort parser is a good example of this. Since the exec is in a long --compress-program flag, we don't need to enumerate all of the short arguments just to handle combined multi-option short forms.

This logic is turned on its head by carving out known-safe parts. When we're zoomed in at just the scope of the carve-out it can feel equivalent, but when we zoom back out to the whole command it is more or less the inverse. We're making no assertion about the presence/absence of exec in a big fraction of the syntax space that the parser will end up running against.

We can find confidence down this path by carving out all known-safe invocations and rejecting all others, but I think this approach clashes with the existing use of under-specified parsers. To have a similar safety profile, this approach needs to be able to fail all invocations that we aren't asserting the ~safety of.

In this specific case with flatpak the risk and model clash is mostly mitigated by how argparse is handling the subcommands/subparsers as a required positional. Since there's a required positional first argument with only one valid value, every other subcommand will error. But the gap is still technically there before the positional; something like the below can sail through:
```
flatpak --maybe-i-do-exec=sort
```
Since this is an invented option it'll pop at runtime, but if it were real exec resholve would just miss it. This gap is probably ~fine in the flatpak case because you are effectively making the assertion about the safety of the top-level option space of flatpak. I'm less sure if it'll hold for other CLIs.

I need to chew on it a bit more, but basically: I suspect I should try to make this behavior an explicit affirmative property of specific parsers and/or options in a way that clarifies the fundamental difference between the assertions they represent.

Sep 23 '23 20:09 abathur

Just a ping to let you know I haven't forgotten this and thank you again for kicking it off.

I agree with carving out space for this kind of parser and know roughly how I want to approach that, but I held up a little to mull naming. Life/work/other-projects keep conspiring to demand time, but I think (hope?) this is close to the top of the pile again.

Oct 28 '23 20:10 abathur

Ok. I've had a chance to take another swing at this and cover most of your notes in the process. https://github.com/abathur/resholve/commit/c3d8bab19d07bacf56d823beff8b2364d44b758d

This separates these two broad approaches into different parser subclasses. I'm still not certain about the names. I'm also not sure how the deadend subparser should fit in here now; I haven't tried to pick at that behavior much.

Nov 19 '23 07:11 abathur

OK, so I just took a look at c3d8bab. Here are my notes:

All of the commands in this script:

!/usr/bin/env nix-shell
! nix-shell -i bash -p flatpak

latpak -h
latpak --help
latpak --version
latpak --default-arch
latpak --supported-arches
latpak --gl-drivers
latpak --installations
latpak --print-updated-env
latpak --print-system-only
latpak -v
latpak -vv
latpak --verbose
latpak --ostree-verbose

cause this error:

sage: flatpak (generic) [-h] {update} ...
latpak (generic): error: UnlocatedExecCommandParser UnlocatedExecCommandParser(prog='flatpak (generic)', usage=None, description=None, version=None, formatter_class=<class 'argparse.HelpFormatter'>, conflict_handler='error', add_help=False) is a strict parser that lacks rules for parsing this invocation, causing this error: 'too few arguments'

I had trouble understanding what the last part of this comment means:
```
 init parser (by calling make_command_parser) with a name and variant,
 i.e. "sed" and "gnu" or "bsd"
```
It’s wasn’t clear to me what "sed", "gnu" and "bsd" refer to. Now that I reread it, I’m realizing that "sed" is an example of a name and that "gnu" and "bsd" are examples of variants. Here’s how I would reword that comment to make it more clear:
```
 init parser (by calling make_command_parser) with a name and variant,
 e.g.
     gnu = make_command_parser("sed", "gnu")
     bsd = make_command_parser("make", "bsd")
```
I like the idea behind LocatedExecCommandParser and UnlocatedExecCommandParser. I’m also unsure about those names, but I don’t really have any better suggestions. Maybe FullyAuditedCommandParser and PartiallyAuditedCommandParser?
I also like the make_command_parser() function. I think that it will make it easier for people who are new to the codebase to understand the WhateverCommandParsers classes.

I'm also not sure how the deadend subparser should fit in here now; I haven't tried to pick at that behavior much.

Could you elaborate? It looks to me like subparsers fit in fine at the moment.

Jan 23 '24 17:01 Jayman2000

All of the commands in this script ... cause this error:

 usage: flatpak (generic) [-h] {update} ...
 flatpak (generic): error: UnlocatedExecCommandParser UnlocatedExecCommandParser(prog='flatpak (generic)', usage=None, description=None, version=None, formatter_class=<class 'argparse.HelpFormatter'>, conflict_handler='error', add_help=False) is a strict parser that lacks rules for parsing this invocation, causing this error: 'too few arguments'

I still need to take a closer look at this to make sure I'm both understanding and not mis-remembering, but I think I recall stumbling on an issue report (I think I had the tab open until a tab purge a few weeks back...) about argparse with a subparser throwing this error if invoked without at least one positional argument.

I'll try to find that again in my history and add some of these example invocations to the flatpak parse test file and see if there's anything we can do. Do you happen to have a public example I can mine for legit invocations that shouldn't error out?

I had trouble understanding what the last part of this comment means... Now that I reread it, I’m realizing that "sed" is an example of a name and that "gnu" and "bsd" are examples of variants.

Good point. Now that you poke at it, I realize I'm leaving a lot unsaid about what a variant is in the first place. I'll probably squash them later, but for now I've just added another commit (https://github.com/abathur/resholve/commit/49b76e8b7078b7132583ff0c884fb1cd5cbae861) to actually unpack why "variant" is a concept here. Do you feel like that works?

I’m also unsure about those names, but I don’t really have any better suggestions. Maybe FullyAuditedCommandParser and PartiallyAuditedCommandParser?

I'll stew on these. (At first blush I have a very small qualm with audit here, but I also see the argument that it's closer to to the human-level ~why behind the different parsers. Have to talk myself around it a bit...)

I'm also not sure how the deadend subparser should fit in here now; I haven't tried to pick at that behavior much.

Could you elaborate? It looks to me like subparsers fit in fine at the moment.

I'm not exactly sure what I mean, yet. It's not a concern about subparsers broadly--just the deadend type. IIRC, it was vague handwringing about some bits I wasn't motivated to brain into submission at the time. This may not be a complete list, but something like:

The broad logic of the LocatedExecCommandParser is to be loose/permissive/tolerant around unexpected arguments (we can implement just enough to be able to locate the exec), so I feel like (from that goal) there's an argument that it shouldn't need deadend subparsers. But I haven't implemented any other commands that require subparsers, so I'm also not entirely convinced that we won't need them. (I'm not sure how parse_known_args works with subparsers and haven't yet read or tested enough to shore up my understanding.)
Likewise, the broad logic of the UnlocatedExecCommandParser is to be strict/intolerant of the presence of any arguments we haven't programmed support for, and deadend subparsers have roughly the opposite logic. I suspect deadend subparsers will make it easier to implement any given UnlocatedExecCommandParser--that's part of why I kept using one for the update subcommand here--but I think on some level I'm worried about mixing those two logics making it harder to understand or reason about these? (I may be able to talk myself out of this one being a problem...)

Jan 26 '24 22:01 abathur

I still need to take a closer look at this to make sure I'm both understanding and not mis-remembering, but I think I recall stumbling on an issue report (I think I had the tab open until a tab purge a few weeks back...) about argparse with a subparser throwing this error if invoked without at least one positional argument.

I'll try to find that again in my history and add some of these example invocations to the flatpak parse test file and see if there's anything we can do. Do you happen to have a public example I can mine for legit invocations that shouldn't error out?

Not really. The original reason why I had started implementing a parser for flatpak is because I was using flatpak in this script, but that script only contains two Flatpak commands. I wrote this script that generates commands to test Resholve with, but it doesn’t necessarily generate valid commands.

Good point. Now that you poke at it, I realize I'm leaving a lot unsaid about what a variant is in the first place. I'll probably squash them later, but for now I've just added another commit (49b76e8) to actually unpack why "variant" is a concept here. Do you feel like that works?

Yeah, that commit makes things much clearer.

Jan 27 '24 20:01 Jayman2000

I've revisited this today to see if I could settle the naming issue and finish out the WIP I had here, but it hasn't gone so well.

I found the argparse thread I mentioned before, and it also isn't great news: https://github.com/python/cpython/issues/103520#issuecomment-1518088696

Even if we make the subparsers narg behave more like the ? non-required positional it will have problems. With a sequence of *? positionals, the * is greedy, and leaves nothing for the ?.

In the linked StackOverFlow I show how changing the '*' to a flagged argument sort of works, but we still need to use some other flagged argument to terminate its list of strings.

As long as subparsers is handled a a positional, it will impossible to change the current behavior. Using a * argument is tricky - it has to be last, or its end has to be clearly marked. Another positional will not work.

I'm now realizing there's a conceptual trap with trying to use the partial parsers--since they have to fail everything they can't match, there's no sane way to ~chain them for slightly different purposes as with the full parsers. (This might only be a problem because of this argparse shortcoming, though.)

I'm a little unsure what to do with this case, now. I think we'd have to either patch up argparse somehow (but the quote above makes me think that'll also have its problems), or maybe concede defeat and either patch up this single parser to work around the limitation or build in a custom flatpak parser. I'm not certain, but those both smell like they might waste more time than just auditing flatpak? Unless I find a lightbulb moment, I'm starting to think this was a dead-end in a no-pun-intended sort of way.

I'm a little bummed, but don't take this as me being down on you for bringing it up. I'm glad you did--I probably would've had to go down this path sooner or later just to convince myself it wasn't this easy. Some other silver linings are:

It has increased my conviction that this is just going to be a slog without parser tooling that actually carves these problems up along the right seams for resholve's needs.
It led to some doc and internal API/QoL improvements that I can extract from this work and go ahead and land.

Feb 17 '24 00:02 abathur

I stumbled on some links I saved relating to the argparse bit of this, suggesting that this got better in python3.7:

https://bugs.python.org/issue9253
https://github.com/python/cpython/pull/3027
https://github.com/python/cpython/issues/103520

Feb 23 '24 00:02 abathur