jsonpath-rust icon indicating copy to clipboard operation
jsonpath-rust copied to clipboard

Creating infallible queries

Open thehiddenwaffle opened this issue 8 months ago • 16 comments

Let me start by saying this crate is awesome!

Use case similar to https://github.com/besok/jsonpath-rust/issues/92

I am working on a project that will involve using json paths, and those paths will be coming from a huge set of static strings. Because I have this set of static strings I would like to be able to perform compile time checks on them to ensure valid json path queries so that I don't have to deal with fallibility in the downstream code. To this end my goal is to simply make a macro that converts them to JpQuery structs, throwing a compiler error if any would fail to be parsed by parse_json_path.

I started looking in the source code and found js_path_process, which I could use in a new trait roughly as follows:

pub trait JsonPathCompiled: JsonPath {
    fn query_with_path_checked(&self, q: &JpQuery) -> Vec<QueryRef<Self>> {
        js_path_process(q, self).expect("When can this fail??")
    }

    fn query_only_path_checked(&self, q: &JpQuery) -> Vec<QueryPath> {
        js_path_process(q, self).expect("When can this fail??").into_iter()
            .map(|r| r.path())
            .collect::<Vec<_>>()
    }

    fn query_checked(&self, q: &JpQuery) -> Vec<&Self> {
        js_path_process(q, self).expect("When can this fail??").into_iter()
            .map(|r| r.val())
            .collect::<Vec<_>>()
    }
}
impl JsonPathCompiled for serde_json::Value {}

To my understanding the only notion success or failure in the original RFC for a valid path is either [some_stuff] for a match or the empty array [] for no matches found.

After reading #92, my real question becomes: is the Err match arm reachable code and under what circumstances would that arm be taken? I tried to dig into the source code but I'm still a bit new to rust and even a simple example would help a lot, if needed then maybe I can just "ban" that edge case with macros somehow.

thehiddenwaffle avatar May 08 '25 15:05 thehiddenwaffle

I should add that normally I wouldn't bother the creator of a crate with something that's so off topic to the original concept of the crate but I was thinking of creating a downstream crate off of these concepts, or if this is something that you would like to integrate into this crate I would be more than happy to do a pull request.

thehiddenwaffle avatar May 08 '25 15:05 thehiddenwaffle

https://github.com/besok/jsonpath-rust/issues/92 is very similar, though I think my use case and many others would benefit from the ability to run pre-parsed queries without the possibility of failure.

thehiddenwaffle avatar May 08 '25 15:05 thehiddenwaffle

Thank you for good words!

You can achive the compilation check using procedural macros. The idea is interesting and definitely can be benefitial. If you want to tackle it and provide a pr it would be great and very welcomed.

Regarding examples of the error cases you can take a look into rfc9535 folder (it is a test suite for js path rfc doc) and especially here so all cases with the invalid_selector = true will be examples of how to handle it.

besok avatar May 09 '25 12:05 besok

Excellent, despite being relatively new to Rust I have been quite obsessed with how much proc macros can do and have written quite a few, though I can't promise it will be the most clean code on the planet.

My thought was to craft it to look something like

let q: JpQuery = json_query!($[?$.event.userId == $.context.id]);

My rationale for not using quotes is that the token tree used by JsonPath to me appears to be a strict subset of the rust one(and also doesn't seem to require any whitespace?). One of the benefits of not using quotes would be allow for the macro to have spanned errors at some point in the future that point straight to the invalid token.

thehiddenwaffle avatar May 09 '25 13:05 thehiddenwaffle

I guess I'm still curious as to what situation could cause the actual running of the query to fail like (slightly later in that test)[https://github.com/besok/jsonpath-rust/blob/244188496a5752be7e1bc19b2a9dd98736660ed3/rfc9535/src/suite.rs#L74]. If we could define that situation I could create a second macro that would not allow the "bad" situation and the crate could feature a query_unchecked(q: SafeJpQuery) -> Vec<QueryRef<'a, T>> that can safely guarantee a value

thehiddenwaffle avatar May 09 '25 13:05 thehiddenwaffle

Also the standard for most crates that I've seen seems to be adding the proc macros as {crate_name}_impl or {crate_name}_derive at the crate root, is there something you would prefer over that?

thehiddenwaffle avatar May 09 '25 13:05 thehiddenwaffle

My thought was to craft it to look something like ...

It looks a bit challenging but feel free to try. The main pitfall here is you need to reproduce the parsing process on the level where macro operates. Anyway, don't hesitate to use this issue thread for that.

Also the standard for most crates that I've seen seems to be adding the proc macros as {crate_name}_impl or {crate_name}_derive at the crate root, is there something you would prefer over that?

yeah, it looks good.

I guess I'm still curious as to what situation could cause the actual running of the query to fail like

I think, this is just a safe guard in case if the new tests will be added to the suite and the parser does not cover it. I used to rely on it when I move the lib to comply to the standard recently.

But you can print the errors from invalid case like that: Replace this if else to this:

       if jspath.is_ok() {
            Err(TestFailure::invalid(case))
        } else {
           if let Err(e) = jspath {
                println!("reason: {}", e);
                println!("selector: {}", case.selector);
            }
            Ok(())
        }

and run main.rs in rfc9535

it will print the reason from parse_json_path and the js path like that:

reason: Invalid json path: Invalid function expression `count   (@.*)`
selector: $[?count      (@.*)==1]

reason: Invalid json path: Invalid number of arguments for the function `match`: got 1
selector: $[?match(@.a)==1]

reason: Invalid json path: Invalid argument for the function `count`: expected a node, got a literal
selector: $[?count('string')>2]

reason: Invalid json path: Invalid number of arguments for the function `match`: got 3
selector: $[?match(@.a,@.b,@.c)==1]

selector: $[::-0]
reason: Failed to parse rule:  --> 1:5
  |
1 | $[::-01]
  |     ^---
  |
  = expected int

besok avatar May 09 '25 15:05 besok

Alright so I'm starting on this and I have run into an issue of circular dependencies because in order for downstream crates to access the macros the main crate will need to have the proc macro crate as a dependency, however in order for the proc macro crate to use the functions such as parse_json_path() it will need to import the main source.

Is there a common way of dealing with this? I haven't had a similar situation come up in any macros I've written before because the parsing logic is usually not attached to the crate it's parsing for.

I think I have 2 possible solutions for this that I've thought of so far:

~1. Pull the types and methods required by both(parse_json_path(), JsonPathError, JSPathParser, next_down(), jp_query()) out into a common crate that both of the other crates can depend on.~ ~2. Use something like https://crates.io/crates/pest-ast to confirm that there won't be test errors then replicate some of the additional checks done in the main crate like the 9007199254740991 index check and such. Feels like this would be too much code replication and would not perfectly match up with the crate's current implementation.~

EDIT: the more that I look at it I'm not sure if either of these solutions is good enough, they would either result in a lot of refactoring or a lot of new code

thehiddenwaffle avatar May 09 '25 18:05 thehiddenwaffle

The function-like macro can be created in the same crate. You can simply include it pub into this module

Speaking of the parsing problem, it is what i pointed out in my previous comment namely, you basically need to reproduce parsing logic in macro itself. It is possible but challenging.

You can probably experiment with proc-macro (for custom types) like

#[json_path]
struct SomeStruct(String);

or

#[derive(JsonPath)]
struct SomeStruct{
   #[json_path]
    query:String

}

And here you can use the default parser

besok avatar May 09 '25 19:05 besok

Honestly I may not have enough skill with "normal" declarative macros(if it's even possible?), feels like being able to use syn::Parse(which of course requires a separate proc-macro package) is the only way I personally will be able to make this happen. To that end I wonder about parsing either input tokens or input strings with syn or pest-ast respectively, then provide infallible conversions from the more explicit syn structs to JPQuery. This would allow the package to stay almost completely the same with some additional impl From<SelectionAST> for Selection{} that could be locked behind a feature by default. Also in order to minimize validation logic which pest-ast doesn't play nice with I have opened #95

thehiddenwaffle avatar May 13 '25 13:05 thehiddenwaffle

it is possible in compile time, for instance, in the following construction having a proc-macro:

#[json_path]
struct JsPath(String);
impl JsPath{
   fn new<T:Into<String>>( v:T){
        JsPath(v.into()) 
   }
}

fn some_fn(){
    let js = JsPath("invalid js path"); // here the error will be already while typing in editor
}

You can fiddle around with syn::Parse and pest-ast and maybe it will help. On the other hand it boils down to inability to reuse the parser (maybe with pest-ast it is possible, idk tbh)

besok avatar May 13 '25 15:05 besok

So I have a solution that seems to be working but I need to test it much more extensively.

There are some details:

  1. So far It's over 1200 lines of parser code. It should be noted that this AST could potentially completely replace the current datatypes as it is a strict superset. It doesn't have to be this way though, as I said I can just provide impls for impl From<JpQueryAST> for JpQuery{...} or the macro can directly translate to JpQuery::new(Segments::new(......so much more......)) using ToTokens.
  2. Using pest-ast/from-pest it is able to both parse token streams at compile time and raw strings at runtime, but those are two separate processes, granted that detail is not a huge worry to me because the test suite for that is as simple as taking all the RFC strings and parsing them with pest and then syn and checking that the results are strictly equal.

Basically I guess I'm asking, do you want some amount of all of this in the same crate? I think there are plenty of things that this crate could leverage by getting an AST for free, however I would understand if you feel that it's too much to pay for what many would consider scope creep. One option would be to just have feature locked dependency to this proc macro crate and have them be similar but separate entities.

I'm in this for the long haul in terms of maintenance because this crate is the bread and butter for an application I'm designing but I also understand that these are just the words of some random person on the internet from your perspective.

thehiddenwaffle avatar May 14 '25 16:05 thehiddenwaffle

Basically I guess I'm asking, do you want some amount of all of this in the same crate? I think there are plenty of things that this crate could leverage by getting an AST for free

I think it would be beneficial. You can roll out the pr, and we can analyze it together.

One small thing is to ensure that all tests pass and tests in rfc folder (running main) remain the same state :)

besok avatar May 15 '25 07:05 besok

Absolutely, I'll probably have a separate test suite for the macro(at least while the macro is still new, maybe this suite could be less comprehensive once it's stable) and will run all suites before creating a PR.

thehiddenwaffle avatar May 15 '25 12:05 thehiddenwaffle

About done with a proof on concept for one possible approach to how the macro types work, as well as a potential way to test all this efficiently and comprehensively. Will create a PR with many paragraphs worth of details/ideas/options.

thehiddenwaffle avatar May 19 '25 20:05 thehiddenwaffle

So with actually running the query, is this correct?

/// A convenience function to process a JSONPath query
/// and return a vector of values, omitting the path.
pub fn js_path_process<'a, 'b, T: Queryable>(
    path: &'b JpQuery,
    value: &'a T,
) -> Queried<Vec<QueryRef<'a, T>>> {
    match path.process(State::root(value)).data {
        Data::Ref(p) => Ok(vec![p.into()]),
        Data::Refs(refs) => Ok(refs.into_iter().map(Into::into).collect()),
        Data::Nothing => Ok(vec![]),
        // The grammar prevents this from happening
        Data::Value(_) => unreachable!(),
    }
}

When I make this change and run all test both smoke and RFC pass all, and on following all the possible code paths I'm pretty confident that everything eventually leads to flat_map which never returns Data::Value under any circumstances.

thehiddenwaffle avatar May 29 '25 15:05 thehiddenwaffle