juniper icon indicating copy to clipboard operation
juniper copied to clipboard

Switch to graphql-parser

Open tailhook opened this issue 6 years ago • 15 comments

Hi,

Since there is no activity in #32 I've decided to write a separate crate for graphql-parser.

It's not based on code here but built using excellent combine crate (making the grammar more readable in my opinion). So in a weekend project I could build the parser for the whole graphql grammar. Which means it should be on par with this crate for features (I've not checked every feature, but we also have subscriptions and triple-quoted strings already implemented).

What do you think about integrating with the graphql-parser crate instead of keeping custom parser?

tailhook avatar Feb 05 '18 00:02 tailhook

That looks much better than my half-baked Pest parser (although I still prefer pest to combine but that's not important)!

It would be really neat for the Rust ecosystem to standardise on a single parser so at least every crate wanting to do something with GraphQL doesn't have to rewrite it and it has many eyes on it.

Keats avatar Feb 06 '18 19:02 Keats

@tailhook I'll look over the code, but in principle, as long as the AST produced is decent and can easily be converted (without much cloning) I'd definitely be in favour.

This also concers https://github.com/nrc/graphql and @nrc. We've already talked about sharing fundamental infrastructure like the parser.

We'd also need the parser to handle parsing the GraphQL schema definition language.

theduke avatar Feb 06 '18 20:02 theduke

@tailhook also, it's important to have great error reporting for parsing errors and to be able to map each ast node to the query location.

theduke avatar Feb 06 '18 21:02 theduke

This also concers https://github.com/nrc/graphql and @nrc. We've already talked about sharing fundamental infrastructure like the parser.

Sure, here is the issue: https://github.com/nrc/graphql/issues/36

it's important to have great error reporting for parsing errors

I think it's good enough. Of course, we have positions of all the errors. And "unexpected A, expected B, C". I'm not sure what else to say. Perhaps, there aren't much test for error reporting yet.

Currently, we don't have decent API for the error besides: format me a string, but it should be easy to add just need to figure out how it might look like.

and to be able to map each ast node to the query location.

We have positions for all important elements. The approach is different that one here, but we can discuss details.

We'd also need the parser to handle parsing the GraphQL schema definition language.

Surely, on a todo list. Does seem like another weekend project in it's size. The good thing that tokenizer may be reused entirely.

tailhook avatar Feb 06 '18 21:02 tailhook

@tailhook all sounds great, I'll look over the code!

Would you be interested in moving the parser into the graphql-rust organization?

It would be cool to have all GraphQL related stuff under one hood, and it would ensure that someone else can maintain the parser if you need a break or whatever.

That's not in any way a requirement for me, though.

ps: I've written a demo parser in combine too, I think it's a good choice as in the the current parser combinator ecosystem.

theduke avatar Feb 06 '18 22:02 theduke

Would you be interested in moving the parser into the graphql-rust organization?

I'm fine with that. Should it be done now?

tailhook avatar Feb 06 '18 23:02 tailhook

SDL is implemented and pushed to crates.io as v0.2.0

tailhook avatar Feb 11 '18 22:02 tailhook

Is there anything blocking that?

Keats avatar Feb 26 '18 19:02 Keats

@Keats I haven't had time to look over the code.

We will probably want to have a custom-tailored ast for Juniper. I'm perfectly fine with getting an AST from the parser and mapping over it to get the custom one. It should be as efficient as possible to do so, though.

Especially wrt tokio, where sharing the AST between possible lots of resolvers (futures) should be as cheap as possible.

@mhallin had some concerns, so we probably want his input here.

theduke avatar Feb 26 '18 19:02 theduke

Hello, and thanks for bringing this up to the table!

While the parser part looks good, I've got some reservations for giving up control over the AST data structures to a general-purpose library. The issue I see is that we won't be able to model the data the way that suits a server application best.

For example, our AST of today uses Cow<&str> references everywhere into the original query source to avoid allocating tons of very small strings. This works today because the main execute function executes the entire query and returns when everything has been resolved. To support async code, where the AST needs to outlive the scope of the execute call, our probable AST of tomorrow has a special string type that works like a &str with a reference counted parent string.

I believe you could switch over graphql-parser to use this code, or build even better versions of all of this, but my reluctance comes from the fact that we now can do that without having to think about other use cases than ours.

mhallin avatar Feb 26 '18 20:02 mhallin

Well, AST in graphql-parser is currently owned, so works fine for async code right now.

Then we can try to benchmark and optimize things. Usually, rust/jemalloc is very good, so allocations themselves are rarely problem. Given that request processing does lots of validation and probably has a bunch of hashmaps the overhead of allocations may be amortized. Also if serde_json can afford owned strings I think graphql library can too.

Another point of view is that size of Spanning<SharedStr> is 72 bytes. I would rather try some kind of "inline string" library instead of that, given that most strings like type names and field names are much smaller than that. (But I'm not sure there is established "inline string" library in the community)

tailhook avatar Feb 26 '18 22:02 tailhook

IMO it's fine if we start from what's there and optimize later, especially if @tailhook is fine with moving the repo into the graphql-rust org.

I also agree that the AST is probably a minuscule part of the work that happens when executing a resolver.

But: Let's consider this relatively straight-forward query, executed with a traditional relational database under the hood, and something similar to dataloader:

query {
    users(page: 2, pagesize: 100) {
        id
        username
        posts(max: 20) {
          name
          comments(limit: 5) {
                id
                title
          }
        }
        likes(max: 50) {
            post {
                id
                title
            }
            time
        }
    }
}

Let's assume that the posts, posts.comments, likes, likes.posts queries are run as futures.

That means: users: 1 + 100 clones posts: 1 + 100 posts.comments: 100 * 20 likes: 1 + 100 likes.post: 100 * 50

Now let's think about 100 of those queries running concurrently... That's a lot of cloning of at least nested parts of the AST. Execution of the futures can be batched quite easily, but the creation of the futures much less so, and those futures will need to know the query.

So I think this can become a significant overhead.

There are a lot of possible optimizations, like just unsafely passing around references to the full AST, using an Arc of the initial query string like mhallin has tried, using some kind of per-query interner, ...

But it's something that we'll need to think about.

theduke avatar Feb 26 '18 23:02 theduke

Execution of the futures can be batched quite easily, but the creation of the futures much less so, and those futures will need to know the query.

I expect this to work by wrapping an Arc around a part of query. I think current SelectionSet which is basically a Vec<Selection> will be more like Arc<FnvHashSet<Field>> where fragments are already populated. So each resolver doesn't need to repeat fragment logic, and subselections can be referred to on their own.

I.e. the AST after validation/optimization is a bit different from what we have right after parsing anyway.

tailhook avatar Feb 27 '18 00:02 tailhook

Some progress: https://github.com/graphql-rust/graphql-parser/pull/19

theduke avatar Apr 09 '19 17:04 theduke

@tailhook considering this issue was opened more than 2 years ago. Would the progress on graphql-parser be sufficient to fully replace the the current parser? I.e. what is the current status of this issue?

tosti007 avatar Mar 14 '20 14:03 tosti007