router icon indicating copy to clipboard operation
router copied to clipboard

dynamic feature toggles

Open yanns opened this issue 2 years ago • 9 comments

Is your feature request related to a problem? Please describe.

In our current GraphQL implementation, we support feature toggles in the GraphQL schema. Based on some variables read from the HTTP request (like header or information from the URI path...), we enable some private beta features. The GraphQL schema reflects those feature toggles. Even an introspection returns only the types/fields that are enabled for a certain request.

I would like to open the discussion if something similar would be possible on the router.

Describe the solution you'd like

A possible solution could be:

  • subgraph expressed with directives with types / fields are only for a certain feature toggle
  • in the router, a plugin could read all data needed from the HTTP request and manipulate the schema to add/remove types/fields.

As a possible optimization, we could try to cache the manipulated schemas. This cache would have to be memory bound to avoid consuming too much memory.

Describe alternatives you've considered

An alternative is to always expose all fields, private beta or not, and to return an error at resolution time if a user is trying to use a feature that is not enabled for her yet. That way, the router is not concerned with the feature toggles. All the different services have to check that during resolution time.

yanns avatar May 20 '22 13:05 yanns

This sounds like contracts: https://www.apollographql.com/docs/studio/contracts/

Currently the router only allows one contract variant, but maybe this could be relaxed such that a router could support multiple variants and switch based on header. @yanns Does this sound like it'd fulfill your needs?

BrynCooke avatar May 21 '22 20:05 BrynCooke

Yes, this sound interesting. From the documentation, I could not really understand how I can activate / deactivate certain contracts based on information from the HTTP request.

From your comment, I understand that this is a feature that does not exist yes, and that could be added. Am I understanding right?

yanns avatar May 22 '22 16:05 yanns

It's a possibility. At this stage I'm not sure how much demand there is for this. Pinging @prasek in case others have asked.

BrynCooke avatar May 22 '22 17:05 BrynCooke

welcome @yanns 👋

@BrynCooke it's come up a handful of times. if you're processing a lot of variants for large n multi-tenant use cases the performance can suffer due to cache eviction, so generally we recommend to run a single router instance per variant. also for the typical public/partner/private contract variants you usually want to enable different plugin/middleware auth configs which also favors separate router instances per variant.

for alpha/beta/experimental contract variants it's worth looking at what it would take for the router to support running multiple variants either directly or by running multiple router core instances (one per variant) as a lib in a single process and switching based on a header, similar to what Lenny was doing with his experimental dev gateway.

prasek avatar May 22 '22 17:05 prasek

It's possible to do, but would probably requires some reasonably significant changes to the codebase. It would have to be a roadmap item that we plan in.

BrynCooke avatar May 22 '22 19:05 BrynCooke

I don't know if you can work with external contributions, but I can check with my company if we can help you here. We could free some engineer time to support you.

At the same time, it can make the process more complex to synchronize. And I guess that it's not the simplest change to start with. When I introduced that feature in our backend, I had to change from one schema to n schemas with a bounded cache.

yanns avatar May 23 '22 06:05 yanns

@yanns We do want external contributions, and we have started taking them in already. As you say though, this probably isn't the best ticket to start with. For context we're going to me making some structural changes in the next few weeks that probably means it's not quite the time to start coding on this (although we can start talking).

Things to consider that spring to mind:

  • What is the user experience impact. Should we remove APOLLO_GRAPH_REF as an environment variable?
  • Separate response pipeline per graph? Or just switch schema?
  • Telemetry plugin will need modification.
  • Are there any other schema aware plugins/layers.

BrynCooke avatar May 23 '22 06:05 BrynCooke

@yanns We do want external contributions, and we have started taking them in already. As you say though, this probably isn't the best ticket to start with. For context we're going to me making some structural changes in the next few weeks that probably means it's not quite the time to start coding on this (although we can start talking).

Things to consider that spring to mind:

* What is the user experience impact. Should we remove APOLLO_GRAPH_REF as an environment variable?

I don't really know - we don't use that for the moment.

* Separate response pipeline per graph? Or just switch schema?

For us, just switching the schema would be sufficient.

* Telemetry plugin will need modification.

* Are there any other schema aware plugins/layers.

For our use-case, I don't think so.

yanns avatar May 30 '22 07:05 yanns

We've checked variants, and it seems to be too static for our use-cases. Maybe we need a new async method in the plugin, that takes the request and the supergraph as input, and delivers the new supergraph to use as output.

Maybe it's even possible with https://github.com/apollographql/router/blob/dev/examples/supergraph_sdl/src/supergraph_sdl.rs#L30 ?

yanns avatar Sep 27 '22 06:09 yanns

I think that this approach won't work.

The issue is that plugins and caches all rely on the fact that schema is immutable. If we were to add such a callback there would be assumptions all over the codebase that would be incorrect.

Even if for your use case things did sort of work (which I suspect they wouldn't) we open ourselves and other users to weird errors in the future, and I can imagine that we'd sink a lot of time into this before getting it working properly.

There is another approach that you could take:

  • Plugins now receive the schema at startup. This is to support schema driven extensions.
  • When a request comes in validate the request against the your schema. You can take directives into account and make sure that users can't access stuff that they shouldn't.

This wouldn't work for adding types to your schema, so you need to make sure that everything is in your supergraph schema in the first place, but you could reject requests that try to access stuff they shouldn't.

BrynCooke avatar Oct 18 '22 08:10 BrynCooke

There is another approach that you could take:

* Plugins now receive the schema at startup. This is to support schema driven extensions.

* When a request comes in validate the request against the your schema. You can take directives into account and make sure that users can't access stuff that they shouldn't.

This wouldn't work for adding types to your schema, so you need to make sure that everything is in your supergraph schema in the first place, but you could reject requests that try to access stuff they shouldn't.

Thanks for the feedback. We were talking about that with Fernando Koch, and we came to a similar approach. This could work. Question about the introspection queries: the plugin also need to take care of those queries to "remove" the stuff they shouldn't access. Do you think that this is possible?

yanns avatar Oct 18 '22 09:10 yanns

I think it may be possible. Introspection happens during the query plan phase, so I think that you have the opportunity to modify introspection results in the supergraph service.

BrynCooke avatar Oct 18 '22 11:10 BrynCooke

Great news. We will try that.

yanns avatar Oct 18 '22 12:10 yanns

I've implemented a technical prototype. To validate the query against the chosen schema, to avoid implementing again some code that is already in the router, I had to make some types and functions public. https://github.com/sangria-graphql/sangria-federated/commit/98d19196db2fad1a460c018ee3232f6f18ebd2c2

            let configuration = apollo_router::Configuration::default();
            let schema =
                apollo_router::spec::Schema::parse(&supergraph_sdl, &configuration).unwrap();
            let result = apollo_router::spec::Query::parse(query, &schema, &configuration);

I had to make apollo_router::spec::Schema::parse and apollo_router::spec::Query::parse (and all related errors) public: https://github.com/yanns/router/tree/PoC_feature_toggle

Maybe we can iterate from here?

yanns avatar Oct 24 '22 16:10 yanns

It'd be better if you parsed directly with apollo-rs. All of the stuff in the spec package is going to disappear over time.

The background to this is that during initial dev apollo-rs did not exist, so we had to create our own data-structures. Now that apollo-rs is maturing we will be removing all of this stuff and querying the schema directly.

BrynCooke avatar Nov 01 '22 13:11 BrynCooke

Thanks for your feedback @BrynCooke I could parse the supergraph SDL and transform it using apollo_parser and apollo_encoder.

Something like:

        let parser = Parser::new(SUPERGRAPH_SDL);
        let ast = parser.parse();
        assert_eq!(ast.errors().len(), 0);

        let doc = ast.document();

        let mut pub_schema = apollo_encoder::Document::new();
        for definition in doc.definitions() {
            match definition {
                Definition::SchemaDefinition(e) => pub_schema.schema(e.try_into()?),
                Definition::OperationDefinition(e) => pub_schema.operation(e.try_into()?),
                Definition::FragmentDefinition(e) => pub_schema.fragment(e.try_into()?),
                Definition::DirectiveDefinition(e) => pub_schema.directive(e.try_into()?),
                Definition::ScalarTypeDefinition(e) => pub_schema.scalar(e.try_into()?),
                Definition::ObjectTypeDefinition(e) => {
                    let mut o =
                        apollo_encoder::ObjectDefinition::new(e.name().unwrap().source_string());
                    // [...] do not select fields with the @feature directive.
                    pub_schema.object(o);
                }
                Definition::InterfaceTypeDefinition(e) => pub_schema.interface(e.try_into()?),
                Definition::UnionTypeDefinition(e) => pub_schema.union(e.try_into()?),
                Definition::EnumTypeDefinition(e) => pub_schema.enum_(e.try_into()?),
                Definition::InputObjectTypeDefinition(e) => pub_schema.input_object(e.try_into()?),
                Definition::SchemaExtension(e) => pub_schema.schema(e.try_into()?),
                Definition::ScalarTypeExtension(e) => pub_schema.scalar(e.try_into()?),
                Definition::ObjectTypeExtension(e) => pub_schema.object(e.try_into()?),
                Definition::InterfaceTypeExtension(e) => pub_schema.interface(e.try_into()?),
                Definition::UnionTypeExtension(e) => pub_schema.union(e.try_into()?),
                Definition::EnumTypeExtension(e) => pub_schema.enum_(e.try_into()?),
                Definition::InputObjectTypeExtension(e) => pub_schema.input_object(e.try_into()?),
            }
        }
        Ok(pub_schema)

What I could not find is how to validate a query against a schema. Is this feature already available? Would you have any pointer?

yanns avatar Dec 14 '22 22:12 yanns

I think these features are coming in apollo-rs if they are not available already. Currently at least some of our validation is performed in deno during query planning via graphql.js. But again this isn't our long term goal.

BrynCooke avatar Dec 14 '22 23:12 BrynCooke

From a discussion with @lennyburdette on slack:

Today the compiler takes a single input, so what i’ve done to validate and interrogate operations is concatenate the SDL string and operation string together: let ctx = ApolloCompiler::new(format!("{}\n{}", &sdl, &operation)); then you can ctx.validate()

It's ok to test a prototype.

For production-like traffic, we would need https://github.com/apollographql/apollo-rs/issues/221

that ticket tracks the work to have the SDL already compiled, and then reuse it for different operations

yanns avatar Dec 19 '22 09:12 yanns

You can find the last state using apollo-rs here: https://github.com/sangria-graphql/sangria-federated/commit/0f4c9c4eb51c02fcfbef12c317cbf3a568e46f88

What is missing - prio 1:

  • queries validation should validate for fields that are not in the schema: https://github.com/sangria-graphql/sangria-federated/commit/0f4c9c4eb51c02fcfbef12c317cbf3a568e46f88#diff-a7953ccd810e9b013e1d457d5a2ede6d912da3f57d5065925cb267b84b9c2cdcR447-R456 - https://github.com/apollographql/apollo-rs/issues/392

What is missing - prio 2:

  • checking if a query is an introspection or not: https://github.com/sangria-graphql/sangria-federated/commit/0f4c9c4eb51c02fcfbef12c317cbf3a568e46f88#diff-a7953ccd810e9b013e1d457d5a2ede6d912da3f57d5065925cb267b84b9c2cdcR160
  • performant query validation using a schema that is already parsed: https://github.com/sangria-graphql/sangria-federated/commit/0f4c9c4eb51c02fcfbef12c317cbf3a568e46f88#diff-a7953ccd810e9b013e1d457d5a2ede6d912da3f57d5065925cb267b84b9c2cdcR103-R104 - https://github.com/apollographql/apollo-rs/issues/221

yanns avatar Dec 21 '22 06:12 yanns

Some progress: I could get query validation working in https://github.com/sangria-graphql/sangria-federated/commit/404293bd356490863863c67bd51f82b1aa2ff648

When we are not allowed to use the feature: Screenshot 2023-03-03 at 10 44 18

When we are allowed to use the feature: Screenshot 2023-03-03 at 10 44 26

The query validation is not 100% complete yet, but this is very promising.

yanns avatar Mar 03 '23 09:03 yanns

@yanns have you taken a look recently? the multi-document support is in there (via the issue you linked to above), and validation is also largely in place.

abernix avatar Aug 15 '23 13:08 abernix

Yes, we could implement a quite complex plugin that is sufficient for our needs for now. The downside is that we need to parse and validate the query in this plugin, and, if everything is ok, we let the router does its work, parsing and validating the query on its own. So we have a performance overhead of parsing the query 2 times. But we can live with that for now.

yanns avatar Aug 28 '23 15:08 yanns