Fancy errors?
Summary
As noted by @dabrahams in https://github.com/mpalmer/action-validator/issues/72#issuecomment-2086780223, the output of the CLI does not make it easy to identify the location of the error. Because it only provides a path to the offending item, users must manually traverse the file to find the source of the error.
Proposal
Implement Rust style diagnostic output.
Prototype
I've been experimenting with saphyr (which offers spanned yaml parsing) and annotate_snippets (which provides an ergonomic API for producing Rust-style diagnostic reports), and I've been able to hack together a (very messy) prototype:
diff --git a/src/lib.rs b/src/lib.rs
index 5d4f9d4..38a8bac 100644
--- a/src/lib.rs
+++ b/src/lib.rs
@@ -6,6 +6,7 @@ mod validation_error;
mod validation_state;
use config::{ActionType, RunConfig};
+use saphyr::{LoadableYamlNode, MarkedYamlOwned, SafelyIndex as _, ScalarOwned, YamlDataOwned};
use std::path::PathBuf;
use validation_error::ValidationError;
use validation_state::ValidationState;
@@ -157,6 +158,7 @@ pub mod cli {
fn run(config: &RunConfig) -> ValidationState {
let file_name = config.file_name.unwrap_or("file");
+
let doc = serde_yaml::from_str(config.src);
let mut state = match doc {
@@ -190,6 +192,50 @@ fn run(config: &RunConfig) -> ValidationState {
state.action_type = Some(config.action_type);
state.file_path = config.file_path.map(|file_name| file_name.to_string());
+ let doc = saphyr::MarkedYamlOwned::load_from_str(config.src)
+ .unwrap()
+ .into_iter()
+ .next()
+ .unwrap();
+
+ for error in &state.errors {
+ let mut node = Some(&doc);
+ match error {
+ ValidationError::NoFilesMatchingGlob {
+ code,
+ detail,
+ path,
+ title,
+ } => {
+ for segment in path.split("/").skip(1).collect::<Vec<_>>() {
+ let Some(prev_node) = node else {
+ todo!();
+ };
+ node = prev_node.get(segment);
+ }
+ let node = node.unwrap();
+
+ let report = &[annotate_snippets::Level::ERROR
+ .primary_title(title)
+ .element(
+ annotate_snippets::Snippet::source(config.src)
+ .line_start(26)
+ .path(config.file_path.unwrap_or(file_name))
+ .annotation(
+ annotate_snippets::AnnotationKind::Primary
+ .span(node.span.start.index()..node.span.end.index() - 2)
+ .label(detail.as_ref().unwrap_or(title)),
+ ),
+ )];
+
+ let renderer = annotate_snippets::Renderer::styled()
+ .decor_style(annotate_snippets::renderer::DecorStyle::Unicode);
+ anstream::println!("{}", renderer.render(report));
+ }
+ _ => {}
+ }
+ }
+
state
}
Limitations
Unfortunately, valico requires a serde_json::Value as input. The serde_json::Value type is un-spanned, and so cannot produce the required span information for producing the desired diagnostic output. For this reason, the above POC parses the file twice: first with serde_yaml to produce errors via valico, then again with saphyr to retrieve span information, used to render the diagnostic messages.
Perhaps unsurprisingly, I was not able to find any JSON schema validators which accept a parsed yaml type with span information (such as produced by saphyr) as input.
Obviously, double parsing isn't ideal because it...
- Adds a performance overhead
- Unnecessarily increases the file size of the binary
- Increases the surface area for bugs
Possible Solutions
I've considered a few options...
1. Accept Defeat
Either temporarily, or permanently, we could accept the downsides of double parsing the YAML file as an acceptable cost for improving the ergonomics of the tool.
Pros:
- Easy - existing implementation is relatively simple
- No need to incur performance overhead if no errors
Cons:
- As mentioned above
2. Contribute a saphy feature flag to valico
Because both saphy and serde_json essentially describe the same kind of data, it should be possible to implement this in valico e.g. behind a flag.
Pros:
- No double parsing
- No performance overhead
Cons:
- Supporting two separate formats adds complexity for
valico - Requires upstream maintainers blessing or a fork (maintainers of
valicomay not want this feature)
3. Implement conversion from saphy to serde_json::Value
We can convert spahy::MarkedOwnedYaml to serde_json::Value and pass that to valico for validation.
Pros:
- No double parsing so less scope for bugs
- Relatively simple to implement conversion (already did a PoC for this)
Cons:
- Requires cloning the entire AST
- Some performance impact due to conversion
- Results in runtime overhead even if no errors are logged
4. Implement spanned parsing for serde_yaml / serde_json / serde
I looked into whether it was possible to add support for span information to our existing dependencies. Long story short, I'm not aware of any way to practically do this in serde at the moment (though I may have missed something) and even if there was a way, it would be necessary to contribute changes to both serde_yaml and serde_json.
Pros:
- Able to use existing dependencies
- Addresses all of the above mentioned limitations
Cons:
- Likely not feasible
5. Implement a "lingua franca" for data formats
At present, there is no agreed upon format for storing or working with unstructured data. serde_json::Value is used extensively, even in our case for non-JSON formats (we work with yaml). However, serde_json::Value owns its data, and cannot be used in cases where another type is required to own the data.
There are many use cases for working with data which do not depend on the specifics of the underlying type. For example, validation is one such use case. Validation requires only knowledge of the structure of the data that is being represented, not details of the types used to represent it.
A trait could be implemented which acts as a "lingua franca" for working with different data types:
enum Value<'data> {
Null,
String(&'data str),
// ...
}
trait DataFormat<'data> {
type Format: DataFormat<'data>;
fn safe_get<TKey: AsRef<str>>(&self, key: TKey) -> Self::Format;
fn as_value(&self) -> Value<'data>;
// ...
}
This can be implemented by any type, including serde_json::Value, serde_yaml::Value, spahy::MarkedOwnedYaml and others, and accepted by validation libraries such as valico.
This approach is inspired by standard-schema which takes a similar approach to solving an unrelated ecosystem fragmentation issue in the TypeScript space.
Pros:
- No double parsing
- No performance overhead
- No increase in binary size
- No hacky glue code required in
action-validator
Cons:
- Outside the scope of this project
- Relies on upstream maintainers accepting this change (at least
valicoor another JSON schema validation crate would need to "buy in" to this approach)
I'm interested to hear thoughts on this and the above issues/solutions. If there's interest for this feature, I'd be happy to work towards this.
Rust diagnostic output is extremely problematic for IDEs that try to handle diagnostics from a wide variety of sources (like emacs). It requires a massive hack to get an acceptable user experience. I beg of you, please please use the gnu error formatting standard, which will already be supported everywhere.
Rust diagnostic output is extremely problematic for IDEs that try to handle diagnostics from a wide variety of sources (like emacs). It requires a massive hack to get an acceptable user experience. I beg of you, please please use the gnu error formatting standard, which will already be supported everywhere.
Hey @dabrahams, thanks for the valuable feedback!
I understand the concern and your proposed alternative makes sense for IDEs. However, Rust style errors, while difficult to parse, have several advantages for humans. Based on my reading of the linked page, the GNU error formatting standard doesn't allow for:
- Rich, multi-line errors
- Annotated code snippets
- Displaying messages for multiple related spans (e.g. "first used here")
- Contextual information such as hints or suggestions
That said, I know tools like GCC do output some of this information, but the first line of an error is always compliant. Is the output from e.g. GCC considered compliant with the standard? Does it cause similar headaches for IDEs like emacs?
Genuinely curious to hear your thoughts about this @dabrahams, as I'm no expert in IDEs or error formatting conventions.
I guess my thoughts are:
- Yes, we should take steps to ensure diagnostic information is widely usable (including by IDEs if there is a demand for that)
- We should not compromise on human readability
If those two goals are incompatible, we could go the way of other tools e.g.
$ action-validator --output=human ... # default
$ action-validator --output=gnu ...
Some big and very awesome ideas here, @bcheidemann. Thank you for the write up. I've never been happy with how action-validator presents errors, but I've also never been sufficiently irritated by it to dedicate any time to digging in to fixing it. I'm glad (if that's the right word) that you have been sufficiently irritated. 😀
My "gut check" on each of the approaches you've suggested, based on no more than about 30 seconds' thought, are:
- "Accept defeat" and double-parse: yeah, not ideal, but a-v isn't a performance critical tool (as far as I know), so this wouldn't be the end of the world.
saphysupport invalico(via feature flag or otherwise): ifvalicowere up for accepting such a thing, it would certainly seem to be a good approach. Presumably others could potentially benefit from spanned parsing, too, so it might not be a hard sell, but I'm sure it's a non-trivial amount of work. At the very least it seems worthwhile to reach out and take their temperature.- conversion from
saphytoserde_json::Value: pros/cons seem similar to double-parsing; in both cases there's a performance hit, which one is "worse" might not be obvious (or it might, I really don't know). The deciding factor between 1 and 3 is probably going to be as much about implementation effort as anything else. - spanned parsing in
serde: the completely impractical uber-nerd in me like this as the "most correct" solution, but hoo boy it would be a huge amount of work, could take literal years to get serde to adopt it, if they ever would. - common format for parsed data: this is a real "boil the ocean" approach, and touches on something that has always annoyed me about serde: every format provides it's own
Valuetype (or equivalent), when the parsed output should really be a genericserde-provided type. In any event, it feels like it'd be a huge amount of work, no matter where it landed, and yeah, it'd need buy-in from lots of people to make it valuable, which makes it a long-term approach, at best.
At a glance, if valico wanted saphy support and it wasn't a lot of effort, that seems like the ideal approach, otherwise whichever of 1 or 3 is easiest to do seems the best approach.
As far as error-reporting formats goes, it would definitely be good to have IDE compatibility as far as possible, but as humans are the lowest common denominator as far as parsing capability goes, human-readability is really important (he says, having written something that produces absolutely incomprehensible error messages...). It seems like you've got a lot of domain expertise in tool compatibility, @dabrahams, which I (and by the look of it, Ben) don't have to nearly the same degree, so your input to keep things tool-friendly as well will be invaluable.
- "Accept defeat" and double-parse: yeah, not ideal, but a-v isn't a performance critical tool (as far as I know), so this wouldn't be the end of the world.
Agreed.
saphysupport invalico(via feature flag or otherwise): ifvalicowere up for accepting such a thing, it would certainly seem to be a good approach. Presumably others could potentially benefit from spanned parsing, too, so it might not be a hard sell, but I'm sure it's a non-trivial amount of work. At the very least it seems worthwhile to reach out and take their temperature.
I've had a look at the valico codebase and I think this would, unfortunately, be quite a significant undertaking, while also maintaining support for serde_json::Value. I imagine dropping serde_json::Value support would not be an option as it would be a huge breaking change.
I could see two possible approaches for adding saphyr support to valico:
- Feature flag - compile different code depending on a cargo feature.
- A trait which implements the operations required by
valicoand can be implemented by bothserde_json::Valueandsaphyr- but this is only a small step away from option 5, which could be a big win for the ecosystem if it gains adoption.
- conversion from
saphytoserde_json::Value: pros/cons seem similar to double-parsing; in both cases there's a performance hit, which one is "worse" might not be obvious (or it might, I really don't know). The deciding factor between 1 and 3 is probably going to be as much about implementation effort as anything else.
Broadly agree. The downside compared to option 1 is that we always incur the overhead of conversion. With option 1, we can incur that cost only if it's needed - i.e. if validation passes, we don't pay.
- spanned parsing in
serde: the completely impractical uber-nerd in me like this as the "most correct" solution, but hoo boy it would be a huge amount of work, could take literal years to get serde to adopt it, if they ever would.
Agreed.
- common format for parsed data: this is a real "boil the ocean" approach, and touches on something that has always annoyed me about serde: every format provides it's own
Valuetype (or equivalent), when the parsed output should really be a genericserde-provided type.
Yes, that's always bothered me too, in particular because serde_json gets used as the defacto serde::Value type even when the use case is not inherently JSON related. It would be nice if there was a serde::Value. However, I don't know if that's practical because the wide scope of serde may make it impractical to agree on the details of that type; what is required for e.g. JSON may be vastly different from other data formats.
In any event, it feels like it'd be a huge amount of work, no matter where it landed, and yeah, it'd need buy-in from lots of people to make it valuable, which makes it a long-term approach, at best.
Yes, it's certainly quite a lot of work. That said, certianly less than option 4, and perhaps not so much more than option 2... After all, the hypothetical trait can be implemented without buyin from serde_json or saphy, and the implementation of the trait for something like serde_json or saphy would be fairly trivial (I already did a prototype to see if it's feasible and start feeling out the rough edges). The main challenge is (1) designing and implementing such a trait and associated standard Value type, and (2) adding support to saphy (which is about the same effort as option 2).
At a glance, if
valicowantedsaphysupport and it wasn't a lot of effort, that seems like the ideal approach
I'd be interested to gauge valicos interest in options 2 and 5. If I were the maintainer of valico, I'd rather add support for a common trait which allows any unstructured data type to be validated, rather than adding support for saphyr and potentially opening myself up for a litany of "add support for some_other_crate::Value type" PRs/issues, and then be on the hook for maintaining those forever.
... otherwise whichever of 1 or 3 is easiest to do seems the best approach.
My gut feel is that option 1 or 3 are the only realistic options to get this done in a reasonable time frame, since they don't require changes to other crates. Of these two, both have pros and cons, and I don't have a strong opinion either way. I'd summarize the main trade-offs as:
- Option 1 requires fewer resources in the error case, because parsing is more expensive than conversion.
- Option 3 requires fewer resources in the success case, because we can skip spanned parsing entirely and un-spanned parsing is is more expensive
As far as error-reporting formats goes, it would definitely be good to have IDE compatibility as far as possible, but as humans are the lowest common denominator as far as parsing capability goes, human-readability is really important (he says, having written something that produces absolutely incomprehensible error messages...).
Agreed on both counts.
It seems like you've got a lot of domain expertise in tool compatibility, @dabrahams, which I (and by the look of it, Ben) don't have to nearly the same degree, so your input to keep things tool-friendly as well will be invaluable.
Yes, please! It's a non-trivial change so the more feedback we can gather on the implementation and possible use cases, the better.
Btw, valico appears to be unmaintained since 2023 (or at least there have been no commits or new versions published) so it might not be possible to get buyin from the maintainer for any required upstream changes... There are other options for JSON schema validation, such as jsonschema, but adopting these would further increase the scope of this already quite significant change.
Rich, multi-line errors Annotated code snippets Displaying messages for multiple related spans (e.g. "first used here") Contextual information such as hints or suggestions
The gnu standard allows for all those things. The only thing it restricts is how you report line and column information and where the primary error message goes. You can put as much text after that line as you want, and several tools do. You can have supplementary notes with their own line and column information after the primary error. You can look at g++ for example, whose diagnostic requirements are at least as complex as Rust's. I believe clang also uses this format, at least given particular command-line flags. If you pass -diagnostic-style llvm to the Swift compiler you get the GNU format.
And let me add, the ability to quickly jump to and between errors while seeing the code they point to during development is a very human concern. Approximately no humans read error messages outside of an IDE. If there's truly some advantage to presenting humans a different view of that little bit of information that's specified by the Gnu standard, give the IDEs the normalized format thay can digest and allow IDE functionality/plugins to munge those standardized parts for presentation.
Follow-up: I can see how you might read the standard as restricting your ability to put error information outside that line, but their standard isn't saying you can't write “auxiliary information” to stderr. Regardless of whether we feel that's bending the rules too far, the point here is to be compatible with a format that tools already recognize, and they all recognize the output from those compilers.
@dabrahams Thanks for clarifying! I'm sure we can at least wrangle annotate-snippets to produce a GNU compliant first line, or find another suitable crate which allows both helpful terminal output and GNU compliant error messages.