clap [modular] User-driven validation

Please complete the following tasks

[X] I have searched the discussions
[X] I have searched the existing issues

Clap Version

master

Describe your use case

We want to get clap smaller (#1365) and right now validation adds bulk regardless of what parts people use
- We could put parts behind a feature flag but that could end up getting messy
In https://github.com/clap-rs/clap/discussions/3005 / #3135, a user requested specialized validation logic

Describe the solution you'd like

Define one or more App Validation trait

Validate: Takes a built App and ArgMatches as an input, reports errors
Requires: Takes a built App, ArgMatches, and args as input, reports which args are required
(maybe) Default value: Takes a built App, ArgMatches, and args as input, reports default values for them

With maybe a hint(&self, app: &App) -> Option<String> (see #1695 and #3321)

Should it also accept a matches: Option<&ArgMatches> for smart hints?

Steps

Refactor all requires, conflicts, etc APIs to be implemented in terms of this
- Try to overcome weirdnesses in the default_value_if API with its use of Option by clarifying intent through builders
Expose the trait and structs to the user in a clap::validate
Deprecate the old Arg functions so now only those using this API pay the cost due to dead code elimination

Alternatives, if applicable

No response

Additional Context

See also https://github.com/clap-rs/clap/discussions/2832

Risks are usability and performance. The most basic APIs, like required, we probably need to keep baked in. Its the special inter-arg interactions that we should be focusing on.

Nov 09 '21 14:11 epage

I'm not quite understanding what the Validate trait will look like. Is this a project-level trait that will be implemented on a certain Fn / FnMut, meaning users don't need to worry about it, or will it be provided by a clap_validate crate, and users need to include that crate and its trait to get App::validator_all?

Jan 09 '22 04:01 ctrlcctrlv

My general expectation for traits in this issue and related ones (e.g. https://github.com/clap-rs/clap/issues/2683) is they would live in clap (we'd need access to them) and we will have an impl for Fn signatures where they make sense for convinience. We'd work to duplicate clap's validation functionality in structs that implement these traits. These would most likely live in clap, relying on the compiler deleting dead code (this will be easier for a compiler to detect than the current under-used validation code).

All of this would start out behind an unstable-* feature flag as we work to polish it up. Once we felt good with them, we'd remove the feature flag and deprecate the old functionality. On the next major release, we'd then remove the old functionality, pushing people to use the trait-based version.

This is all contingent on the design working out. We might find it just isn't worth completing this.

Jan 10 '22 16:01 epage

Wait, why would clap need access to the trait?

The trait could just fit into clap's builder pattern for App, providing a function for App which modifies App.

Then you could call App::error in the Validator trait impl. clap wouldn't need to know the validator exists so long as App has the right struct members pub or fn's to get them.

Jan 13 '22 07:01 ctrlcctrlv

The proposed validator trait would be called during parse so we have the context of the specific subcommand being processed.

Jan 13 '22 15:01 epage

Forgive me for not knowing the internals as well as you, but couldn't the Validator trait, instead of being called on App, be called on ArgMatches instead?

Perhaps as app.get_matches().validate().

Jan 13 '22 16:01 ctrlcctrlv

It needs access to the relevant App for any extra error reporting it wants to do, like

Checking for additional context for whether validation has failed or not
Showing usage
Showing help if we made this replace ArgsRequiredElseHelp'

Jan 13 '22 17:01 epage

I see. Honestly, as much as I don't like it, perhaps you do have a strong case for pushing App::error on downstream users, and we're just going to have to learn to live with it. ;-)

Jan 13 '22 18:01 ctrlcctrlv

The primary motivation behind this is to see if we can make the cost of validation paid by those who use it rather than everyone (ie clap's existing validation logic like conflicts, special require rules, etc). A secondary motivation is to simplify the APIs for the more complex validation APIs in clap today. All other validation use cases would just be a bonus since we do have solutions like App::error. This might work and be merged; it might fail and we reject this issue.

Jan 13 '22 18:01 epage

Hopefully you can avoid my horrible situation of the past week: adding an API (https://github.com/MFEK/glifparser.rlib/commit/6aaf88012709fce8fd21ccf0b61d1074a95307a3), realizing it's broken and trying to fix it (https://github.com/MFEK/glifparser.rlib/commit/d9aa702df074094f1509f071e15e11cc696a4ca3)…twice (https://github.com/MFEK/glifparser.rlib/commit/9b06a9422b796d25ab2b270abfd51500c6e5857b). Maybe it has no edge cases now, we shall see. Popular software can't afford such…unprofessionalism, I'm aware. :)

Jan 13 '22 19:01 ctrlcctrlv

This would most likely be implemented via

Refactor the existing implementation to use a pub(crate) trait (dog fooding)
Expose it as a pub trait behind a feature flag with a stablization process (e.g. https://github.com/clap-rs/clap/issues/2861) (collect public feedback)
Stablize the feature (point of no return ... until next major version)
On next major version, remove parts of API not needed anymore

Jan 13 '22 20:01 epage

What about relying on serde for custom validation instead? clap would serialise it, and users who would write custom deserialisation, helped with the helper methods of serde. That would a be more modular than clap implementing its own trait and visitor

May 06 '22 15:05 kraktus

I worry serde would too much complexity for most cases since a lot of the cases people are interested in validating would require hand-writing serde traits which can be burdensome while just providing a Vaidator trait in clap that works on an ArgMatches would be relatively trivial to implement. No visitors needed.

May 06 '22 16:05 epage

Not necessarily.

With serde_value, you get a Value type that represents any Serde "object". These can be coerced into the user's structs because they also implement Deserialize.

I think @kraktus' idea is a good middle ground.

May 06 '22 21:05 ctrlcctrlv

With serde_value, you get a Value type that represents any Serde "object". These can be coerced into the user's structs because they also implement Deserialize.

I'm not seeing how that addressed the concern I raised. I'm assuming there is context missing that I could make guesses at but I would prefer to understand what you are getting at then understanding my assumptions :)

The other problem with serde is it assumes Paths are UTF-8 so naive handling of paths will be done incorrectly while clap is wanting to ensure paths are default handled as a bag of bytes rather than utf-8

May 06 '22 21:05 epage

I was thinking that you'd have Deserialize on the whole ArgMatches. It's already normal for me to have an Args struct. If ArgMatches could become a serde::Value then it could become also an Args, right?

May 06 '22 21:05 ctrlcctrlv

How does users writing custom validation fit into that picture? If the user just wants to do a field-level validator, they can use the field attributes to provide their own custom deserializer. If they want to validate the entire struct, which is what this issue is focusing on, then they would need to hand write the entire Deserialize trait which is not pretty to do by hand.

Compare that to just passing in a value that implements Validator as described earlier. That is a large burden for the user to do one vs the other and so doesn't seem a viable way of solving this problem. We also have a custom derive that offers a lot of domain-specific value that can't be had with clap, which lowers the value add of going down this route for custom validation.

That said, it would be interesting for people to support serde support for ArgMatches but I think that would best start out as an external crate so it gives people room to experiment and try out ideas more to see what works or doesn't before pulling it into clap. This will require a newtype due to the orphan rules. Some more features will be needed from ArgMatches which I would like to provide but they are blocked on some v4 changes because of difficulties with our internal Id type.

May 06 '22 21:05 epage

My problem was I was only considering my patch #3029 and not that you wanted to make every case work.

To refresh memories, my patch was focused on validating a single struct field that contains multiple values.

So I could use a Newtype and impl FromStr.

But you're right this won't let you validate the whole struct.

May 06 '22 21:05 ctrlcctrlv

Erm, or I could impl From<Vec<…>> for Newtype.

Then when I call values_of / values_of_os...I could quickly make the returned Vec into my Newtype.

I hadn't considered the other cases, my bad, I forgot this issue was not just about #3029

May 06 '22 22:05 ctrlcctrlv

Where is work being done on this? Is there more discussion that must be had first?

Jun 25 '23 02:06 Easyoakland

For myself, I have tasks on other projects that I feel are higher priority on this. Someone else could do some exploratory work on this but the challenge is this is very open-ended work and someone working on it would need to keep us in the loop on decisions to make sure there is buy-in on the direction being taken.

Jun 25 '23 02:06 epage

It seems like what this is proposing is a validator trait on functions which can take a validator to make new validator. Then use Combinatory_logic to define all the validators. How is this proposed approach different from copying bpaf?

Jun 25 '23 03:06 Easyoakland

How is this proposed approach different from copying bpaf?

In bpaf you would compose parsers instead of validators, see this article for a general idea: https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/

Disclaimer, I'm not an expert in clap, but I can try to give some examples according some hopefully idiomatic existing code.

cargo-add needs user to specify the source, so it defines https://github.com/rust-lang/cargo/blob/5b377cece0e0dd0af686cf53ce4637d5d85c2a10/src/bin/cargo/commands/add.rs#L85 a bunch of arguments with some validation - "tag" requires "git", "path" conflicts with "git", etc.

Values later extracted into variables https://github.com/rust-lang/cargo/blob/5b377cece0e0dd0af686cf53ce4637d5d85c2a10/src/bin/cargo/commands/add.rs#L226

And consumed with https://github.com/rust-lang/cargo/blob/5b377cece0e0dd0af686cf53ce4637d5d85c2a10/src/bin/cargo/commands/add.rs#L259

Biggest problem with that is that user code (parse_dependencies) relies on validated invariants, not parsed ones.

With bpaf you would have those values constructed either by derive or construct! macro.

struct Options {
   ... 
   source: Input,
   ...
}

enum Ref {
    Branch { branch: String },
    Tag { tag: String },
     Rev { rev: String },
}

enum Input {
    Crate { name: String },
    File { path: PathBuf },
    Git { uri: String, ref: Option<Ref>  },
}

With parsers and validators operating on strictly typed Rust values.

For hypothetical input validator (let's say against crate blacklist) clap version would have to deal with the same steps - getting fields, assuming some invariants been vitiated earlier (unreachable!("clap should ensure we have some source selected");) and validating them. With bpaf validator would get a val: &Input or val: &Ref depending on where it is placed.

Now say you want to add one extra input type, let's call it "interactive picker".

With clap you modify parser definition and then you have to go over all the places where ArgMatches are used and update them, otherwise things will explode on runtime with unreachable!("clap should ensure we have some source selected");

With bpaf compiler will tell you all the places where match val needs to be updated to handle one more branch.

This becomes a problem when argument parser or parts of it are shared across multiple binaries, but even within a single crate it is easier to work with the right type

enum OutputFormat {
    Intel,
    Att,
    Llvm,
    LlvmInput,
    Mir,
    Wasm,
    McaIntel,
    McaAtt,
}

rather than trying to emulate it using product types:

struct OutputFormat {
    intel: bool,
    att: bool,
    llvm: bool,
    ...
}

Jun 30 '23 16:06 pacak

clap clap copied to clipboard

[modular] User-driven validation

Please complete the following tasks

Clap Version

Describe your use case

Describe the solution you'd like

Alternatives, if applicable

Additional Context

clap
clap copied to clipboard