federation icon indicating copy to clipboard operation
federation copied to clipboard

Progressive `@override` controlled by some external state

Open lennyburdette opened this issue 2 years ago • 8 comments

Multiple customers have asked for a mechanism to gradually override a field when migrating from one subgraph to another.

If we added an additional label argument to the @override directive, the query planner could use that label to fetch a value from a "query planning context".

# subgraph a
type Query {
  foo: Bar
}
# subgraph b
type Query {
  foo: Bar @override(from: "a", label: "my-rollout")
}

With a context of { "my-rollout": true }, the override would apply and the query planner routes to subgraph a. For any other context (false or a missing key), the override does not apply and the query planner router to subgraph b.

We could populate the context in a few ways:

  • Environment variables
  • Rhai scripts and simple stateless logic (look at headers)
  • A coprocessor hooked up to Unleash, Launch Darkly, or any other feature flagging system.

lennyburdette avatar Oct 13 '22 16:10 lennyburdette

thanks @lennyburdette

Some context on the problems we've seen not having this functionality: We have a number of teams moving from our monolithic graph/pre-existing REST service into federation. In do so they are porting larger chunks of functionality getting those set up as a subgraph and then continuing to break those down further into more subgraphs to better distribute ownership/responsibility. Effectively we are seeing a desired to support partial entity migrations. This would be a non-issue during initial development, however performing the migration once the entities are taking production traffic increases the rollout risk. The ability to cut over a percentage of traffic to the new subgraph and validate it is stable has been a critical ask for these teams. The lack of this capability has in led to teams who currently own REST services to question the move to Federation in the first place.

As an example,

#Subgraph A
type ImportantExample @key(fields: "field2")  {
    field1: String
    field2: String
    field3: String
    field4: String
    field5: String
}

would like to split into

#Subgraph A
type ImportantExample @key(fields: "field2")  {
    field1: String
    field2: String
    field3: String
    field4: String
    field5: String
}

#Subgraph B
type ImportantExample @key(fields: "field2")  {
    field1: String @override(from: "a")
    field4: String @override(from: "a")
    field2: String
}

The problem here is that when Subgraph B is integrated, it immediately takes all traffic that was hitting Subgraph A. Subgraph B has yet to be pressure tested in the production which poses a much greater risk to its release. Techniques such as traffic mirroring before composing Subgraph B are inadequate due to changes in the query plan. This puts the onerous of the dial up on something that can generate updated query plans as part of the dial up. We do not have the option of the caller slowly cutting over to the new endpoints like we would with a new REST service because there is only one unified graph.

While it would be ideal to have an API (like a rover command) that could set the new rollout percentage, for our needs it is okay to require a new deployment as first out. This would allow us to manage that value within our pipeline deployment where are already performing some schema manipulation already.

thiscompsciguy avatar Oct 27 '22 19:10 thiscompsciguy

Another approach to this problem could be handled at the Uplink/supergraph delivery layer. Regardless of @override, what really is happening is the need for the Router to see a new version of the supergraph at a slow percentage. So even if you were adding or deleting a field or adding @override, you want to slowly roll out a new version of the supergraph.

This will not be able to handle cases where you only have 1 instance or a few instances handling load as you slowly allow the running Router to get the new supergraph, so it might have to be more detailed to allow the Router to run two supergraph versions in one instance and which version handles traffic is controlled by some external value, but it could be another lever. I don't know if this is better, just throwing out other ideas.

smyrick avatar Dec 15 '22 22:12 smyrick

👋 adding another data point for wanting this feature (at Yelp)

Implementation detail note: having a runtime function that could be invoked to make the decision would be preferable over a number baked into the schema, so we can toggle this from external sources.

Here's a pretty horrible way of hacking around this in a pinch I suppose:

https://stackblitz.com/edit/federation-traffic-split-demo?file=gateway.js,b.js,a.js,index.js

(Try it with this query)
query {
  getBusiness(id: 3) {
    name
  }
}

tl;dr you could sniff for the subgraph query containing the field in question, and use a RemoteGraphQLDataSource to do request.url = ... and manualy override where it gets resolved from.

(This is is clearly terrible, the most obvious pitfall being that the query might contain fields that the subgraph doesn't know how to resolve - but in a world where a new subgraph is being set up to move out fields from a monolith, you can make sure to set up @override(from: "...") the right way around and be ok (such that the default query being made to the new subgraph won't contain any other fields from the monolith it doesn't know how to resolve yet)

We may have to deploy something like this whilst we wait for this proposal or similar to be released.

(In other news, thanks everyone for the discussion around this and pointing me here. Federation and the folks working on it are awesome! 😊)

magicmark avatar Jan 17 '23 05:01 magicmark

We’re also looking for something like this. The idea of adding a parameter to override is interesting though I feel it might be somewhat brittle since you’d need to match it to a variable in the router without any support from the composition.

I proposed in the router project that it offer a set of hooks that would receive a field->subgraph list and could alter the order of preference for resolving a field when there are multiple options (as in a shareable or override cases). This would permit a lot of flexibility in how the end user then decides to handle routing decisions by say for instance wiring up to LaunchDarkly or some other feature flag system to control the roll out of a field migration.

paulpdaniels avatar Feb 23 '23 04:02 paulpdaniels

Hey all, thanks for the feedback and additional context on this issue! This is something we're currently researching, and I'd be happy to chat with anyone about your use-cases / urgency on this feature. We're trying to get a better sense of the priority for this, so any feedback is welcome. You can setup time directly on my Calendly link, or we can chat over email at [email protected].

korinne avatar Feb 23 '23 16:02 korinne

I'd love to see a way to meter (slow roll out) and automatically fall back to the previous resolver of a field. That way, if the subgraph that has implemented @override begins failing to resolve, a human doesn't have to intervene to switch back to the original resolving subgraph.

I do agree with the statement from @paulpdaniels that a decoupled parameter that is defined by Router/Query planner but used in the subgraph schema may not be ideal. Something in Router that would allow for a plugin to control the changeover of overridden fields may be a better solution?

Right now, the supergraph appears to maintain enough information to know who the new and old subgraph are:

Subgraph A:

type Book {
  title: String!
}

Subgraph B:

type Book {
  title: String! @override(from: "marketplace_listing")
}

Supergraph:

type Book
  @join__type(graph: MARKETPLACE_LISTING)
  @join__type(graph: RECS)
{
  title: String! @join__field(graph: RECS, override: "marketplace_listing")
}

It might be helpful if the override argument to @join__field were to be resolved to one of the join__Graph enum values. But as it stands, that @join__field does indicate who is taking over from whom.

gwardwell avatar May 11 '23 20:05 gwardwell

I asked about this today on Discord, and what I was hoping for was some way to opt-in to a subgraph override on a per-request basis. My use case is that I want to work on a subgraph progressively, merging to master without the subgraph field(s) being 100% ready yet. Once it is ready I could remove the gate and have the override fully apply to all requests.

chadxzs avatar Jun 21 '23 18:06 chadxzs

This has now shipped in Router 1.39.0 and Federation v2.7. We can now close this issue

https://www.apollographql.com/docs/federation/federation-versions#v27

smyrick avatar Mar 20 '24 23:03 smyrick