cosmo icon indicating copy to clipboard operation
cosmo copied to clipboard

Schema composition order matters and causes query planner errors

Open YassineElbouchaibi opened this issue 1 year ago • 17 comments

Hello, it seems the order in which you publish schemas to the control plane matters and will cause issues down the line.

Here is a scenario I am experiencing:

  1. Publish subgraph A's schema defining type X
  2. Publish subgraph B's schema extending type X
  3. Publish subgraph A's schema defining type X with a minor change unrelated to type X
  4. At this point if you have subgraph C with type Y which has an attribute of type X, the subgraph seems to crash.

I am sorry I cannot provide full repro steps at this point as I am on mobile, I will provide a github repo later.

The error starts happening after this PR (#1092) which is released in [email protected]

In the mean time here is the error message from the router v0.107.4 up to v0.151.1. This one is specifically from 0.117.0 with minor stuff redacted:

ERROR internal error {"hostname": "*******", "pid": 1, "component": "@wundergraph/router", "service_version": "0.117.0", "reqId": "*******/*******", "error": "1 error occurred:\n\t* failed to obtain planning paths: failed to create planning paths, missing paths: [query.someQuery.request.data.someAttribute.$1ThisIsTheTypeX.__typename], has field waiting for dependency: false\n\n"}
github.com/wundergraph/cosmo/router/core.logInternalErrorsFromReport
        github.com/wundergraph/cosmo/router/core/errors.go:89
github.com/wundergraph/cosmo/router/core.writeOperationError
        github.com/wundergraph/cosmo/router/core/errors.go:173
github.com/wundergraph/cosmo/router/core.(*PreHandler).Handler-fm.(*PreHandler).Handler.func1
        github.com/wundergraph/cosmo/router/core/graphql_prehandler.go:306
net/http.HandlerFunc.ServeHTTP
        net/http/server.go:2141
github.com/wundergraph/cosmo/router/core.NewWebsocketMiddleware.func1.1
        github.com/wundergraph/cosmo/router/core/websocket.go:136
net/http.HandlerFunc.ServeHTTP
        net/http/server.go:2141
github.com/wundergraph/cosmo/router/internal/graphiql.(*Playground).ServeHTTP
        github.com/wundergraph/cosmo/router/internal/graphiql/playgroundhandler.go:33
github.com/wundergraph/cosmo/router/internal/requestlogger.(*handler).ServeHTTP
        github.com/wundergraph/cosmo/router/internal/requestlogger/requestlogger.go:199
github.com/wundergraph/cosmo/router/pkg/trace.(*Middleware).Handler.func1
        github.com/wundergraph/cosmo/router/pkg/trace/middleware.go:52
net/http.HandlerFunc.ServeHTTP
        net/http/server.go:2141
go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp.(*middleware).serveHTTP
        go.opentelemetry.io/contrib/instrumentation/net/http/[email protected]/handler.go:229
go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp.NewMiddleware.func1.1
        go.opentelemetry.io/contrib/instrumentation/net/http/[email protected]/handler.go:81
net/http.HandlerFunc.ServeHTTP
        net/http/server.go:2141
github.com/wundergraph/cosmo/router/core.(*graphServer).buildGraphMux.func5.1
        github.com/wundergraph/cosmo/router/core/graph_server.go:522
net/http.HandlerFunc.ServeHTTP
        net/http/server.go:2141
github.com/wundergraph/cosmo/router/core.(*graphServer).buildGraphMux.func4.1
        github.com/wundergraph/cosmo/router/core/graph_server.go:503
net/http.HandlerFunc.ServeHTTP
        net/http/server.go:2141
github.com/go-chi/chi/v5.(*Mux).ServeHTTP
        github.com/go-chi/chi/[email protected]/mux.go:73
net/http.HandlerFunc.ServeHTTP
        net/http/server.go:2141
github.com/go-chi/chi/v5.(*Mux).Mount.func1
        github.com/go-chi/chi/[email protected]/mux.go:315
net/http.HandlerFunc.ServeHTTP
        net/http/server.go:2141
github.com/klauspost/compress/gzhttp.NewWrapper.func1.1
        github.com/klauspost/[email protected]/gzhttp/compress.go:495
net/http.HandlerFunc.ServeHTTP
        net/http/server.go:2141
github.com/go-chi/chi/v5.(*ChainHandler).ServeHTTP
        github.com/go-chi/chi/[email protected]/chain.go:31
github.com/go-chi/chi/v5.(*Mux).routeHTTP
        github.com/go-chi/chi/[email protected]/mux.go:443
net/http.HandlerFunc.ServeHTTP
        net/http/server.go:2141
github.com/wundergraph/cosmo/router/pkg/cors.(*cors).ServeHTTP
        github.com/wundergraph/cosmo/router/pkg/cors/config.go:74
github.com/go-chi/chi/v5/middleware.RealIP.func1
        github.com/go-chi/chi/[email protected]/middleware/realip.go:36
net/http.HandlerFunc.ServeHTTP
        net/http/server.go:2141
github.com/go-chi/chi/v5/middleware.RequestID.func1
        github.com/go-chi/chi/[email protected]/middleware/request_id.go:76
net/http.HandlerFunc.ServeHTTP
        net/http/server.go:2141
github.com/wundergraph/cosmo/router/core.newGraphServer.RequestSize.func9.1
        github.com/wundergraph/cosmo/router/internal/middleware/request_size.go:14
net/http.HandlerFunc.ServeHTTP
        net/http/server.go:2141
github.com/wundergraph/cosmo/router/internal/recoveryhandler.(*handler).ServeHTTP
        github.com/wundergraph/cosmo/router/internal/recoveryhandler/recovery.go:39
github.com/go-chi/chi/v5.(*Mux).ServeHTTP
        github.com/go-chi/chi/[email protected]/mux.go:90
github.com/wundergraph/cosmo/router/core.newServer.func1
        github.com/wundergraph/cosmo/router/core/http_server.go:62
net/http.HandlerFunc.ServeHTTP
        net/http/server.go:2141
net/http.serverHandler.ServeHTTP
        net/http/server.go:2943
net/http.(*conn).serve
        net/http/server.go:2014

For now a workaround seems to be to republish subgraph B's schema extending type X.

I'm sorry for not following the standard format, I will tweak this issue later.

YassineElbouchaibi avatar Dec 12 '24 17:12 YassineElbouchaibi

WunderGraph commits fully to Open Source and we want to make sure that we can help you as fast as possible. The roadmap is driven by our customers and we have to prioritize issues that are important to them. You can influence the priority by becoming a customer. Please contact us here.

github-actions[bot] avatar Dec 12 '24 17:12 github-actions[bot]

Here is a github that allows repro: https://github.com/YassineElbouchaibi/cosmo-issue-1440-repro

Steps to start a router that works and one that breaks depending on the composition order are in the readme as well as an example query.

YassineElbouchaibi avatar Dec 17 '24 19:12 YassineElbouchaibi

Have you had the opportunity to attempt reproducing the bug?

YassineElbouchaibi avatar Jan 07 '25 19:01 YassineElbouchaibi

@YassineElbouchaibi Sorry—I think the festive holidays interrupted this.

What would be helpful and make things quicker is if you could attach the working router execution config json and the non-working router execution config json.

Aenimus avatar Jan 25 '25 00:01 Aenimus

Just tested with [email protected], this issue still exists.

What would be helpful and make things quicker is if you could attach the working router execution config json and the non-working router execution config json.

Attaching them here.

cosmo-router-broken.json cosmo-router-works.json

YassineElbouchaibi avatar Feb 20 '25 17:02 YassineElbouchaibi

Hi @YassineElbouchaibi

We've confirmed there is an issue and we need to investigate.

Thank you for the report!

The WunderGraph Team

Aenimus avatar Feb 25 '25 13:02 Aenimus

Any updates on this ?

YassineElbouchaibi avatar Apr 07 '25 14:04 YassineElbouchaibi

Just tested with v0.206.0, same issue still occurs:

{"hostname": "localhost", "pid": 1, "service": "@wundergraph/router", "service_version": "0.206.0", "request_id": "localhost/scLnlVGtWQ-000005", "trace_id": "263008a84f4cc8acb086d3ba87b48b64", "error": "1 error occurred:\n\t* failed to obtain planning paths: failed to create planning paths, missing paths: [query.c.a.$3A1.__typename], has field waiting for dependency: false\n\n"}

For the same repro as above

YassineElbouchaibi avatar May 07 '25 18:05 YassineElbouchaibi

Hi @YassineElbouchaibi

Thanks for checking on the latest version

It is an edge case of abstract selections combined with interface objects, e.g. we know where the problem is, but don't yet have enough time to jump onto it

Thanks for your patience Wundergraph Team

devsergiy avatar May 12 '25 14:05 devsergiy

It appears we've started hitting this same issue as well. Don't suppose there's been any further investigation into it?

iDub79 avatar May 28 '25 01:05 iDub79

Hi @YassineElbouchaibi

Thanks for checking on the latest version

It is an edge case of abstract selections combined with interface objects, e.g. we know where the problem is, but don't yet have enough time to jump onto it

Thanks for your patience Wundergraph Team

Hello, do you have an idea of how you wanted to solve this problem or is it still at a stage where you just know where the problem is ? If you know approximately how to solve the problem, would it possible to explain here in case I find time to look into it ?

YassineElbouchaibi avatar Jun 04 '25 14:06 YassineElbouchaibi

Just tested with v0.228.0, issue still occurs using the same repro as above :

03:31:14 AM ERROR core/errors.go:102 internal error {"hostname": "docker-desktop", "pid": 1, "service": "@wundergraph/router", "service_version": "0.228.0", "request_id": "docker-desktop/2HKX6Uzn6g-000005", "trace_id": "389005195e71b0255ee43f085813f095", "error": "1 error occurred:\n\t* failed to obtain planning paths: failed to create planning paths, missing paths: [query.c.a.$0A1.__typename], has field waiting for dependency: false\n\n"}

YassineElbouchaibi avatar Jul 04 '25 03:07 YassineElbouchaibi

Is this the same issue as discussed in this discord thread? https://discord.com/channels/738739428314316823/1374488425377435659

cmtm avatar Jul 08 '25 23:07 cmtm

Both seem to be an issue with "abstract selections combined with interface objects" so I would guess yes

YassineElbouchaibi avatar Jul 09 '25 03:07 YassineElbouchaibi

I saw there were some updates on the discord thread above so I tested the repro with router v0.239.2, issue still occurs using the same repro as above :

17:06:53 PM ERROR core/errors.go:102 internal error {"hostname": "hostname", "pid": 1, "service": "@wundergraph/router", "service_version": "0.239.2", "request_id": "hostname/PKrtdkdcAb-000005", "trace_id": "62ea539076093fc572d22d4b4ba25e7d", "error": "1 error occurred:\n\t* failed to obtain planning paths: failed to create planning paths, missing paths: [query.c.a.$3A1.__typename], has field waiting for dependency: false\n\n"}
github.com/wundergraph/cosmo/router/core.logInternalErrorsFromReport
        github.com/wundergraph/cosmo/router/core/errors.go:102
github.com/wundergraph/cosmo/router/core.writeOperationError
        github.com/wundergraph/cosmo/router/core/errors.go:321
github.com/wundergraph/cosmo/router/core.(*PreHandler).Handler-fm.(*PreHandler).Handler.func1
        github.com/wundergraph/cosmo/router/core/graphql_prehandler.go:434
net/http.HandlerFunc.ServeHTTP
        net/http/server.go:2220
github.com/wundergraph/cosmo/router/core.NewWebsocketMiddleware.func1.1
        github.com/wundergraph/cosmo/router/core/websocket.go:143
net/http.HandlerFunc.ServeHTTP
        net/http/server.go:2220
github.com/wundergraph/cosmo/router/internal/requestlogger.(*handler).ServeHTTP
        github.com/wundergraph/cosmo/router/internal/requestlogger/requestlogger.go:171
github.com/wundergraph/cosmo/router/core.(*graphServer).buildGraphMux.func4.1
        github.com/wundergraph/cosmo/router/core/graph_server.go:1033
net/http.HandlerFunc.ServeHTTP
        net/http/server.go:2220
github.com/wundergraph/cosmo/router/core.(*graphServer).buildGraphMux.func3.1
        github.com/wundergraph/cosmo/router/core/graph_server.go:1008
net/http.HandlerFunc.ServeHTTP
        net/http/server.go:2220
github.com/wundergraph/cosmo/router/internal/recoveryhandler.(*handler).ServeHTTP
        github.com/wundergraph/cosmo/router/internal/recoveryhandler/recovery.go:59
github.com/wundergraph/cosmo/router/core.(*graphServer).buildGraphMux.func1.1
        github.com/wundergraph/cosmo/router/core/graph_server.go:945
net/http.HandlerFunc.ServeHTTP
        net/http/server.go:2220
github.com/go-chi/chi/v5.(*Mux).ServeHTTP
        github.com/go-chi/chi/[email protected]/mux.go:73
net/http.HandlerFunc.ServeHTTP
        net/http/server.go:2220
github.com/wundergraph/cosmo/router/core.newGraphServer.func2.CookieWhitelist.2.1
        github.com/wundergraph/cosmo/router/internal/middleware/cookie_filter.go:12
net/http.HandlerFunc.ServeHTTP
        net/http/server.go:2220
github.com/klauspost/compress/gzhttp.NewWrapper.func1.1
        github.com/klauspost/[email protected]/gzhttp/compress.go:495
net/http.HandlerFunc.ServeHTTP
        net/http/server.go:2220
github.com/go-chi/chi/v5.(*ChainHandler).ServeHTTP
        github.com/go-chi/chi/[email protected]/chain.go:31
github.com/go-chi/chi/v5.(*Mux).routeHTTP
        github.com/go-chi/chi/[email protected]/mux.go:478
net/http.HandlerFunc.ServeHTTP
        net/http/server.go:2220
github.com/wundergraph/cosmo/router/pkg/trace.(*Middleware).Handler.func1
        github.com/wundergraph/cosmo/router/pkg/trace/middleware.go:54
net/http.HandlerFunc.ServeHTTP
        net/http/server.go:2220
go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp.(*middleware).serveHTTP
        go.opentelemetry.io/contrib/instrumentation/net/http/[email protected]/handler.go:229
go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp.NewMiddleware.func1.1
        go.opentelemetry.io/contrib/instrumentation/net/http/[email protected]/handler.go:81
net/http.HandlerFunc.ServeHTTP
        net/http/server.go:2220
github.com/wundergraph/cosmo/router/pkg/cors.(*cors).ServeHTTP
        github.com/wundergraph/cosmo/router/pkg/cors/config.go:76
github.com/go-chi/chi/v5/middleware.RealIP.func1
        github.com/go-chi/chi/[email protected]/middleware/realip.go:36
net/http.HandlerFunc.ServeHTTP
        net/http/server.go:2220
github.com/go-chi/chi/v5/middleware.RequestID.func1
        github.com/go-chi/chi/[email protected]/middleware/request_id.go:76
net/http.HandlerFunc.ServeHTTP
        net/http/server.go:2220
github.com/wundergraph/cosmo/router/core.newGraphServer.RequestSize.func6.1
        github.com/wundergraph/cosmo/router/internal/middleware/request_size.go:14
net/http.HandlerFunc.ServeHTTP
        net/http/server.go:2220
github.com/wundergraph/cosmo/router/core.newGraphServer.HandleCompression.func5.1
        github.com/wundergraph/cosmo/router/internal/middleware/compression.go:62
net/http.HandlerFunc.ServeHTTP
        net/http/server.go:2220
github.com/wundergraph/cosmo/router/internal/recoveryhandler.(*handler).ServeHTTP
        github.com/wundergraph/cosmo/router/internal/recoveryhandler/recovery.go:59
github.com/go-chi/chi/v5.(*Mux).ServeHTTP
        github.com/go-chi/chi/[email protected]/mux.go:90
github.com/wundergraph/cosmo/router/core.newServer.func1
        github.com/wundergraph/cosmo/router/core/http_server.go:77
net/http.HandlerFunc.ServeHTTP
        net/http/server.go:2220
net/http.serverHandler.ServeHTTP
        net/http/server.go:3210
net/http.(*conn).serve
        net/http/server.go:2092

YassineElbouchaibi avatar Jul 30 '25 17:07 YassineElbouchaibi

Hi @YassineElbouchaibi

thanks for the check, we haven't fixed this behaviour yet

Thanks, WunderGraph Team

devsergiy avatar Jul 30 '25 17:07 devsergiy

Hello @devsergiy! Thanks for your response, generally speaking, having the query planner rely on a sorted composition result rather than the current raw result would lead to more predictable and stable router outcomes. This approach reduces non-determinism in planning, simplifies debugging, and improves reproducibility across environments. It also enhances collaboration, since developers are less likely to encounter inconsistent behavior due to subtle differences in service composition order.

Currently, everything might appear to work fine in local, QA, or development environments but then unexpectedly fail in production, either immediately or later when a new subgraph is published. These issues are difficult to trace and reproduce, often surfacing only under specific composition orders or deployment sequences. By enforcing a sorted composition result, we reduce the surface area for such environment-specific discrepancies, making the system more robust, predictable, and developer-friendly across the entire software development lifecycle.

Here is another instance where the query planner behaves inconsistently depending on the composition order: https://discord.com/channels/738739428314316823/1366771735231336611

I posted this thought there as well but posting here to keep it in this thread as well!

YassineElbouchaibi avatar Jul 30 '25 17:07 YassineElbouchaibi