apollo-federation-ruby Representation fields in entity queries are not validated at query execution time

It's currently possible to issue an entities query with representations that have non-existent fields on the schema. Using an example for testing that's provided in your README:

it "resolves the blog post entities" do
  blog_post = BlogPost.create!(attributes)

  query = <<~GRAPHQL
    query($representations: [_Any!]!) {
      _entities(representations: $representations) {
        ... on BlogPost {
          id
          title
          body
        }
      }
    }
  GRAPHQL

  variables = { representations: [{ __typename: "BlogPost", id: blog_post.id }] }

  result = Schema.execute(query, variables: variables)

  expect(result.dig("data", "_entities", 0, "id")).to eq(blog_post.id)
end

The above works correctly. If you using the following representation with a bad typename:

 { representations: [{ __typename: "BlogPostDoesNotExist", id: blog_post.id }] }

You get a runtime error because the schema fails to resolve BlogPostDoesNotExist to a known type. This is good!

However, If you do this:

 { representations: [{ __typename: "BlogPost", id: blog_post.id, randomFieldThatDoesNotExist: 5 }] }

The query runs without error despite the field randomFieldThatDoesNotExist being present (undefined in the BlogPost type).

Correct me if I'm wrong, but I believe all representations for a given type K that are being issued to a subgraph must only reference a set of fields that are a subset of fields defined for K in the subgraph. So For example, if I have type BlogPost that has fields A, B, and C. All entities queries containing representations for BlogPost must, at minimum, contain the relevant key fields and any other field that's defined on BlogPost that's a subset of {A B C}.

PS: Thank you for this gem!

Apr 03 '23 15:04 linstantnoodles

@linstantnoodles Thanks for the issue submission. From a quick glance at the federation subgraph specification docs, I see only two constraints on the representations argument:

Each entry in the representations list must be validated with the following rules:

A representation must include a __typename string field.
A representation must contain all fields included in the fieldset of a @key directive applied to the corresponding entity definition.

While there is nothing that I can find that requires the validation behavior you suggest, I understand where you are coming from. I think the impact of adding extra invalid fields is that they are essentially dropped/ignored, so unless someone is abusing the representations for purposes of authentication or authorization, I don't know that there is any kind of risk in leaving the behavior as-is. What do you think?

Apr 03 '23 15:04 grxy

@grxy Thanks for the speedy reply!

That's true - the federation specs don't actually specify validation behavior for entity fields. I also realized that since the input type of the representation objects are currently Any types, you can't really perform validations on the input types at query parse time the same way graphql implementations validate arguments for standard / normal queries...

I think the impact of adding extra invalid fields is that they are essentially dropped/ignored, so unless someone is abusing the representations for purposes of authentication or authorization

I agree - our gateway currently issues these queries and we have a schema registry that performs schema validation so it's not going to include invalid fields in practice. Like you said if a bad actor chooses to issue invalid fields that don't exist, they'll just be ignored.

The problem I'm pointing out specifically is within the context of pre-release testing (and as you pointed out I don't think this is a problem with this gem itself). The absence of any input validation of the representation inputs means that I can reference any key on the representation object inside the reference resolvers even if it doesn't correspond to an actual field.

For example:

Given this representation with an invalid field:

{ representations: [{ __typename: "BlogPost", id: blog_post.id, randomFieldThatDoesNotExist: 5 }] }

It's currently possible within a reference resolver to do object[:randomFieldThatDoesNotExist] and get whatever value you stubbed within a test even if it's not a real field, which means you can build business logic depending on a key that's not actually defined on the type itself. We can catch these types of runtime errors at integration, but it just would be ideal for the query parser can validate input shapes earlier in the process.

I'll raise this with the apollo team and get their thoughts on this! Given the dynamic nature of representations in general, it's probably not straightforward to set more concrete type definitions for those inputs.

Apr 04 '23 14:04 linstantnoodles

Found this paragraph in the spec:

Each item in the representations list is an _Any scalar. This is a federation-specific scalar defined in Subgraph schema additions. This scalar is serialized as a generic JSON object, which enables the graph router to include representations of different entities in the same query, all of which can have a different shape.

Link

I didn't realize you can have different entity representations in an entities query! If that's the case, I don't see how it's possible to type the representation inputs to anything other than Any 😢

Apr 04 '23 14:04 linstantnoodles

@linstantnoodles You're right that a schema-level validation would not be possible (until input unions are supported at least), but we could add an extra runtime validation check in theory. The idea that comes to mind is that we could potentially validate shapes based on the included __typename field, i.e. when a representation hash comes in, we could look at the schema's type map and verify that all included fields are valid for the provided __typename.

Apr 04 '23 15:04 grxy

@grxy Yes that's true - I think having that runtime validation would be nice because clients (even if it's just the router) should be notified if they're attempting to resolve types with bad input (representations containing fields that don't exist). I think that behavior would be consistent with the semantics of how we treat invalid arguments into query field i.e blogPosts(notAField: true). That being said, this will probably be a breaking change so while it's nice to have I'm not sure if it's worth introducing it if existing apps have this assumption already baked in.

May 22 '23 16:05 linstantnoodles