resholve Ease path to adding new external command parsers

I'd like to see a way to .override resholve from nixpkgs to effectively throw in some arbitrary extra code into ExternalCommandParsers. The override should also affect resholve.writeScript and such as well, of course. If this could be made part of the "solution" rather than an override, all the better.

It's a comparatively easily implemented relief valve to at least allow a user to do something to handle a situation where a command does execute its arguments and there's no parser for it in resholve. Currently there's no way forward in that case other than:

externally resolving the relevant command somehow and tricking resholve into thinking resolution isn't necessary
maintaining a patch file to apply to resholve during build
maintaining a fork of resholve

Longer-term, it would be good to develop some kind of simple language for users to describe a program's argument structure that covers most cases and is a more stable interface, but at least this works as a near-universal fallback for users willing to put in the effort.

Jul 04 '22 12:07 tejing1

I've been hoping this day (i.e., someone asking this question) would come. :)

I'll be chewing on this, but I'll sketch out why it doesn't exist yet:

The task is still fraught/messy. This makes me hesitant to spill a lot of complexity on people who may think they're undertaking something simple. Especially since we might have to make N rounds of breaking changes (until we've wrestled with enough real-world complexity to create a solid pluggable abstraction).

Some commands just need a straightforward ~ArgParse parser with a magic name for picking out the command--and I have wanted to either make those pluggable with config (ideal) or code. The regular structure of these in the code might under-sell how much complexity is still lurking here. Some sources:
- Early on, I thought I might be able to close over differences between linux/BSD variants. This is mostly true, so there are a lot of combined parsers like: https://github.com/abathur/resholve/blob/45583b78f3b6427a996ecbc306afefe8fdb1ace7/resholve#L1608-L1641 It didn't take long to start hitting cases where the syntax conflicts in some way that entails a separate parser like: https://github.com/abathur/resholve/blob/45583b78f3b6427a996ecbc306afefe8fdb1ace7/resholve#L1782-L1822. So, now resholve tries a sequence of parsers and take the first that finds a sub-command.
  
  This is working in practice, but I'm not sure we won't find cases where both forms would match (but mis-identify) sub-commands in invocations meant for the other form.
- A few commands have some kind of syntactic complexity that argparse can't handle. Examples so far include find and sed.
  
  https://github.com/abathur/resholve/blob/45583b78f3b6427a996ecbc306afefe8fdb1ace7/resholve#L1371-L1424
- I've found at least one command with mutually-exclusive syntax forms that affect where the command will be:
  
  https://github.com/abathur/resholve/blob/45583b78f3b6427a996ecbc306afefe8fdb1ace7/resholve#L1678-L1704
- Something about the command syntax entails a more sophisticated/creative handling step. IIRC the main examples of this that I've handled so far are sed, awk, and dc.
  
  https://github.com/abathur/resholve/blob/45583b78f3b6427a996ecbc306afefe8fdb1ace7/resholve#L3559-L3768
I originally intended (and tried) to build a pressure-relief valve here alongside lore. The goal was to enable users to manually specify an "assay" that just explicitly told resholve which argument was executable. In order to sort out the logistics of actually substituting these (i.e., when and where in the process) I stubbed out a proof-of-concept based on specifying the word-number: https://github.com/abathur/resholve/blob/1e34e32f42374e6000b997915df5ee844e286f6e/tests/behavior.bats#L163-L219

When I turned to mulling how to make a ~syntax for triaging each invocation that would be easy to understand, work with, and not a PITA to maintain as the target script evolved, I started to suspect the triage approach might be a mis-feature. (Since I want resholve to carry the most-common of these, I want to avoid making a mechanism that is clumsy, a little miserable to use, and inadvertently burns energy that would be better spent upstreaming support for additional commands.)
I've been implementing these external-executable-intel bits in a fairly centralized way (parsers in resholve, resholve's Nix API invoking binlore to do the binary analysis) for expedience.

But, I don't love how far knowledge about the package (in the form of lore, lore overrides, and parsers) is from the package. In the long term we'll probably need extra mechanisms to manage complexity that comes with centralizing it (ways to disambiguate version/variant differences, deal with drift between rules and the upstream features, etc.). If it was in the package, resholve wouldn't need to worry about whether it's the BSD or GNU version, or whether there's a version-number difference.

I don't want to actually pursue that yet, but a good way to prepare would be figuring out a humane no-code format that can flexibly express either the full command syntax or which command arguments are executable with enough fidelity to reliably disambiguate them. (Or figuring out with some confidence that it just isn't tractable.)

Edit (April 2024): my thinking on the last paragraph above has evolved a good bit since I wrote it. Some notes:

I still want something here, but I'm not sure if it can be a true no-code format.
Some of the most-troublesome cases here involve commands that make dynamic decisions about how to parse their arguments as they go. (I.e., the semantics of an arg/flag or the number of words consumed in some position may change when they encounter some flag/arg.)
Perhaps it could still be fairly declarative, maybe something graph-based that can express conditional logic is enough to handle this?
A great solution might have to handle at least 3 more kinds of complexity:
- If the parsers live on/in the packages they describe, I previously speculated that we might be able to get away without needing a framework for disambiguating clashes between, say BSD/GNU or backwards-incompatible version changes. As I've mulled this, though, I've come to wonder if this disambiguation is an ideal property of the best implementation (regardless of where the parser definition lives)? It's what we'd want in order to be able to be able to throw an error if we're trying to use a GNU-only flag with the BSD variant, or use a removed flag with a version it's been removed in. I haven't convinced myself that this is table-stakes--this type of problem isn't currently in scope for resholve, and I think it's defensible to say it's out of scope. But, if we find an ~inspired way to achieve it, I think it could ultimately unlock a lot of use cases (which might help attract hands/eyes to help build and maintain the parsers).
- While at the moment we're most-concerned with executable arguments, there are other things (like environment variables and perhaps some static/relative paths outside of the store) under user control that any given command could potentially exec. If the scope of the effort was more like, "describe the external influences on how this runs", it might make sense to tackle these? (Command-line arguments and envs are two partly overlapping types, also config files/directories and any other path that could affect it.)
- Since this project is Sisyphean at best, I think it makes sense to do a little work up front to ensure that it supports the greatest number of use-cases that it can bear without overcomplicating it. It should probably be abstract enough that it can have implementations in most languages someone could plausibly want to write or parse CLI invocations from. It should probably be able to drive completion-generators. Once complete, it should probably be able to replace a command's own parsing routines. If it can do this, it should probably be able to do tricks like: compile down to something that supports only the ~edge spec for the command; merge in parsers that model other variants of the command and be able to run in a test mode that warns/errors when maintainers are adding a flag that'll clash with other popular implementations; keep you from re-using arguments with different semantics without swearing you aren't an idiot; either generate or force you to write deprecation/migration notes for people who run no-longer-supported arguments as long as they aren't shadowed by newer args; etc.

Jul 04 '22 20:07 abathur

I'm trying to figure out how to deal with nixos-enter, which I think is related to this issue since nixos-enter --command 'echo foo' and nixos-enter -- echo foo can execute commands and resholve/binlore don't have rules for it.

I started down the path of execer = [ "can:${pkgs.nixos-install-tools}/bin/nixos-enter" ];, but then got stuck because I couldn't figure out how to teach resholve about nixos-enter's use of echo. So is the only user-facing escape hatch currently to lie to resholve with cannot, like execer = [ "cannot:${pkgs.nixos-install-tools}/bin/nixos-enter" ];?

I'm happy to make a binlore issue for nixos-enter or something along those lines, but I'm not entirely sure I understand the situation correctly so I thought I'd start with a comment here.

Dec 28 '22 22:12 chasecaleb

I'm trying to figure out how to deal with nixos-enter, which I think is related to this issue since nixos-enter --command 'echo foo' and nixos-enter -- echo foo can execute commands and resholve/binlore don't have rules for it.

A comment here is a good place to start.

I haven't used nixos-enter, but a quick look suggests that the commands would run in a chroot--are the inputs to resholve guaranteed to be available inside it? (I guess the "bad" case would be if only system packages in the config specified via the --system option are available?)

If everything is available, it can probably be tractable w/ a bit of work. Its options smell like a good example of why staking out a user-facing mechanism is tricky (and why I'm taking it slow...). Are you able to link to the script you need to resholve?

IIUC the -- form would treat echo as an external command (which resholve can handle, but it takes adding a parser).

The --command form appears to run in a bash shell session where echo would refer to the shell builtin. I remember implementing a ~generic shell parser for things like bash -c, but I don't remember wrestling with the nested-shell-script-scope that a principled implementation would require.

I started down the path of execer = [ "can:${pkgs.nixos-install-tools}/bin/nixos-enter" ];, but then got stuck because I couldn't figure out how to teach resholve about nixos-enter's use of echo. So is the only user-facing escape hatch currently to lie to resholve with cannot, like execer = [ "cannot:${pkgs.nixos-install-tools}/bin/nixos-enter" ];?

Correct.

This isn't an oversight, just one of the trickiest features on the roadmap to get right.

I briefly thought we could satisfice w/ a way to specify positions to resolve and replace--but after playing with a draft implementation I decided it was a misfeature. (Basically a worse version of patch+substitute* that is harder to use/understand and part of resholve's maintenance burden...)

The other ~obvious approach is enabling people to write parsers without having to upstream them to resholve. That does need to happen as soon as feasible, but I'm not happy with how easy it is to write and maintain the existing parsers and don't want to make it harder to change until I can either fix it or make my peace w/ its sharp corners.

Dec 29 '22 01:12 abathur

nixos-enter is a chroot; the external system is no longer available for programs inside, at least not at the original file path. So really, resholve shouldn't be resolving anything inside the command it's configured to run. It's just a string, from resholve's perspective.

Dec 29 '22 01:12 tejing1

@tejing1 and @abathur you're both right about nixos-enter specifically being an edge case due to using chroot, thanks for pointing that out.

The other ~obvious approach is enabling people to write parsers without having to upstream them to resholve. That does need to happen as soon as feasible, but I'm not happy with how easy it is to write and maintain the existing parsers and don't want to make it harder to change until I can either fix it or make my peace w/ its sharp corners.

Okay that makes sense, thanks.

Dec 29 '22 17:12 chasecaleb

Documenting something I should've noted a while back.

A few months ago one of the Fig co-founders suggested that Fig's autocomplete format might help on HN (https://news.ycombinator.com/item?id=32862163).

Their format is somewhat documented in https://fig.io/docs/reference/arg and the other bits of the reference section in the sidebar. Some likely problems:

I searched "isCommand" in their autocomplete repo, however, and have low confidence that their collected definitions for autocompletions are currently robust enough for resholve's needs.
I didn't get an answer back re: how well what they're doing copes with thorny things resholve has to be able to handle (like combined no-argument short flags terminated by a short flag that does take an option), but it might be a tree worth barking up again (whether that's asking them, playing with fig directly, or figuring out if there's actually a parser in it, etc.)
Their spec format is less dynamic than the formats I'm aware of for working shells--but the autocomplete focus still means that their specs use some dynamic behavior. If they were high-quality enough for our needs here, I imagine we'd still need to exclude those dynamic parts? I haven't audited deeply enough to know if that's going to be intrinsically fraught (i.e., no way to exclude them without knocking out support for some arguments, etc.)

Edit: Fig.io will be closing down, so I imagine there's a nonzero chance their repos stop improving. Unless the fig community decide to fork-and-maintain fig in some way, the prospect of having a resource (even if imperfect) that is maintained by a large community with many eyes seems grim.

Announcement for context:

Fig is sunsetting, migrate to Amazon CodeWhisperer Dear Fig users,

Effective September 1, 2024 we will be ending access to Fig.

We encourage users to migrate to Amazon CodeWhisperer for command line. It’s free on the Individual tier and is designed to be faster and more reliable than Fig. To make this transition as easy as possible, users can upgrade to CodeWhisperer for command line directly from the Fig dashboard.

To learn more about the changes to Fig and how to export your data, read our blog post.

With hundreds of thousands of users, 22k GitHub stars, 13k Discord members, 400+ open source contributors, and 5 products, we are incredibly proud of what we accomplished. We are incredibly thankful to our community for their support, and we are excited to continue our journey with you at Amazon as part of Amazon CodeWhisperer.

Brendan, Matt, and the Fig team

Download CodeWhisperer for command line Download Fig before it sunsets (existing users only) View docs for Fig CLI completion specs View Fig user manual Contact Fig support

Mar 12 '23 18:03 abathur