router "Cannot return null for non-nullable field" when __resolveReference return null parent object

Describe the bug Router returns "Cannot return null for non-nullable field" for fields on objects, although __resolveReference return null parent

To Reproduce I created a repository to reproduce this. https://github.com/seerg0/apollo-router-issue-demo

If in short. I have 2 subgraph: users, products. supergraph looks like:

type User
  @join__owner(graph: USERS)
  @join__type(graph: USERS, key: "email")
{
  email: String! @join__field(graph: USERS)
  id: String! @join__field(graph: USERS)
  name: String @join__field(graph: USERS)
  userProduct: UserProduct @join__field(graph: USERS)
}

type UserProduct
  @join__owner(graph: PRODUCTS)
  @join__type(graph: PRODUCTS, key: "userId")
  @join__type(graph: USERS, key: "userId")
{
  productId: String! @join__field(graph: PRODUCTS)
  userId: String! @join__field(graph: PRODUCTS)
}

User type has field userProduct. This field resolved on Products service throw __resolveReference. In __resolveReference I check that product for this user exists. And if not, that return null.

const resolvers = {
  UserProduct: {
    __resolveReference: (reference) => {
      const userProduct = userProducts.find(
        (up) => up.userId == reference.userId
      );
      if (!userProduct) return null;
      return userProduct;
    },
  },
};

But UserProduct type has not-nullable fields and Router throws errors for its.

Expected behavior If I use Apollo Gateway, I get null without errors. I expect the same behavior for Router.

Output

Desktop (please complete the following information):

OS: macOS
Version 12.1

Additional context Apollo Router version 1.1.0

Oct 03 '22 18:10 seerg0

I’ve reproduced this in a stand-alone (but not reduced) test case that shows subgraph requests and responses:


#[tokio::test(flavor = "multi_thread")]
async fn test_issue_1930() {
    let schema = r#"
        schema
            @core(feature: "https://specs.apollo.dev/core/v0.2"),
            @core(feature: "https://specs.apollo.dev/join/v0.1", for: EXECUTION)
        {
            query: Query
        }
        
        directive @core(as: String, feature: String!, for: core__Purpose) repeatable on SCHEMA
        
        directive @join__field(graph: join__Graph, provides: join__FieldSet, requires: join__FieldSet) on FIELD_DEFINITION
        
        directive @join__graph(name: String!, url: String!) on ENUM_VALUE
        
        directive @join__owner(graph: join__Graph!) on INTERFACE | OBJECT
        
        directive @join__type(graph: join__Graph!, key: join__FieldSet) repeatable on INTERFACE | OBJECT
        
        type Product {
            id: String!
            name: String
        }
        
        type Query {
            allUsers: [User] @join__field(graph: USERS)
            user(id: String!): User @join__field(graph: USERS)
        }
        
        type User
            @join__owner(graph: USERS)
            @join__type(graph: USERS, key: "email")
        {
            email: String! @join__field(graph: USERS)
            id: String! @join__field(graph: USERS)
            name: String @join__field(graph: USERS)
            userProduct: UserProduct @join__field(graph: USERS)
        }
        
        type UserProduct
            @join__owner(graph: PRODUCTS)
            @join__type(graph: PRODUCTS, key: "userId")
            @join__type(graph: USERS, key: "userId")
        {
            productId: String! @join__field(graph: PRODUCTS)
            userId: String! @join__field(graph: PRODUCTS)
        }
        
        enum core__Purpose {
            """
            `EXECUTION` features provide metadata necessary to for operation execution.
            """
            EXECUTION
        
            """
            `SECURITY` features provide metadata necessary to securely resolve fields.
            """
            SECURITY
        }
        
        scalar join__FieldSet
        
        enum join__Graph {
            PRODUCTS @join__graph(name: "products" url: "http://localhost:4011")
            USERS @join__graph(name: "users" url: "http://localhost:4010")
        }
    "#;
    let router = apollo_router::TestHarness::builder()
        .schema(schema)
        .subgraph_hook(|subgraph_name, default| match subgraph_name {
            "users" => apollo_router::plugin::test::MockSubgraph::builder()
                .with_json(serde_json::json!({  
                    "query": "query User__users__0($userId:String!){user(id:$userId){id name userProduct{__typename userId}}}",
                    "operationName": "User__users__0",
                    "variables": {
                        "userId": "user-2"
                    }
                }), serde_json::json!({
                    "data": {
                        "user": {
                            "id": "user-2",
                            "name": "user-2",
                            "userProduct": {
                                "__typename": "UserProduct",
                                "userId": "user-2"
                            }
                        }
                    }
                }))
                .build()
                .boxed(),
            "products" => apollo_router::plugin::test::MockSubgraph::builder()
                .with_json(serde_json::json!({  
                    "query": "query User__products__1($representations:[_Any!]!){_entities(representations:$representations){...on UserProduct{productId}}}",
                    "operationName": "User__products__1",
                    "variables": {
                        "representations": [
                            {"__typename": "UserProduct", "userId": "user-2"}
                        ]
                    }
                }), serde_json::json!({
                    "data": {
                        "_entities": [null]
                    }
                }))
                .build()
                .boxed(),
            _ => default,
        })
        .build()
        .await
        .unwrap();
    let query = r#"
        query User($userId: String!) {
            user(id: $userId) {
            id
            name
                userProduct {
                    productId
                    userId
                }
            }
        }
    "#;
    let request = supergraph::Request::fake_builder()
        .query(query)
        .variable("userId", "user-2")
        .build()
        .unwrap();
    let response = router
        .oneshot(request)
        .await
        .unwrap()
        .next_response()
        .await
        .unwrap();
    assert_eq!(
        response.data,
        Some(json!({
            "user": {
                "id": "user-2",
                "name": "user-2",
                "userProduct": null
            }
        }))
    );
    assert_eq!(response.errors, []);
}

Oct 04 '22 16:10 SimonSapin

Reformatted subgraph queries:

query User__users__0($userId: String!) {
  user(id: $userId) {
    id
    name
    userProduct {
      __typename
      userId
    }
  }
}

query User__products__1($representations: [_Any!]!) {
  _entities(representations: $representations) {
    ... on UserProduct {
      productId
    }
  }
}

Oct 04 '22 16:10 SimonSapin

I understand this better after checking with the team. The error message is correct, and was added on purpose (even if Apollo Gateway doesn’t have it) to point out nullification in case it’s unexpected.

Indeed, __resolveReference in the products subgraph returns null for the userProduct. But before it gets to that, the Router first made a request to the users subgraph which returns userProduct: {__typename: "UserProduct", userId: "user-2"} without a productId.

When merging that with userProduct: null from the products graph, the Router ends up with a non-null userProduct but no productId. Since productId is not allowed to be null, the whole userProduct is turned into null instead. In this case it’s as intended, so you can safely ignore this error.

I agree that the error message is misleading since nothing explicitly claimed productId: null.

Oct 11 '22 13:10 SimonSapin

Hello @SimonSapin,

thank you for looking into this issue.

When merging that with userProduct: null from the products graph, the Router ends up with a non-null userProduct but no productId.

I don't understand what you mean by that. Given the above schema definition, userProduct is nullable. Why would the router end up with a non-nullable type then? Yes the requested productId is non-nullable, but why would a non-nullable child-field make the parent non-nullable as well?

Let's take federation and the router out of the equation. Given the above example schema. Now sending the following operation:

query GetUser {
	user(id: "123") {
		id
		name
		userProduct {
			id
			userId
			productId
		}
	}
}

If a user doesn't have a UserProduct and the resolver therefore returns null. The operation wouldn't fail (in plain GraphQL). Because User.userProduct is nullable and nullability is inherited from the parent to the child fields.

Now we translate that scenario to Federation v2 with apollo-router. __resolveReference for the userProduct returns null, but nullability is not inherited from the parent to the child, but instead the non-nullability from the child fields will travel upwards the graph and apply to the parent as well.

I understand that this is, because the User subgraph returned a reference id to a UserProduct, but how should the User subgraph predict how a third-party subgraph will react?

I think the beauty of federation is now broken, because subgraphs are no longer encapsulated services that can work on their own. They now have to be joined thightly coupled, because every subgraph needs to know about the state and behaviour of the other subgraphs.

Let me give you an overview how we use GraphQL Federation.

We have separate autonomous teams working on their part of the supergraph. The clients that are requesting fields on the supergraph way have different permissions. User A may have admin access and can see all UserProducts but user B might not be able to see UserProducts at all. As the UserProduct team is autonomous they check if a user has permission to view UserProducts and if they don't their __resolveReference might throw an error informing about the missing permission.

With the now altered behaviour every subgraph that uses federated entities via reference ids will now have to apply the permission check on their end already. Meaning, the User subgraph has to predict already what the UserProduct subgraph will answer. But in order to apply such checks the User subgraph basically has to know more than just the ID of the UserProduct. This may even reach the point where the User subgraph duplicated the UserProduct to their state in order to correctly predict how UserProduct will respond - making federation impractical, all subgraphs could form a monolithic GraphQL service.

This of course also applies to other reasons why the UserProduct subgraph returns null in the __resolveReference resolver.

In this case it’s as intended, so you can safely ignore this error.

This would potentially not be so much of an issue, because the functionality itself still works, there is just this additional new error returning. But there is no way a client can know what this error means and that it's safe to ignore. It looks exactly like any other non-null error that potentially is a signal of a bug in the subgraph.

I would at least like to see this behviour configurable to be honest, because we cannot use apollo-router as it's right now. In this state we need to consider switching back to apollo-gateway for the time being.

Thanks a lot, really appreciated 🙏

Nov 04 '22 15:11 KennethWussmann

to give a bit of context about what is happening here.

query User($userId: String!) {
  user(id: $userId) {
    id
    name
    userProduct {
      productId
      userId
    }
  }
}

This will first call the USERS subgraph and return:

{
  "user": {
      "id": "1",
      "name": "1",
      "userProduct": {
          "__typename": "UserProduct",
          "userId": "1"
      }
  }
}

So we already have the UserProduct.userId field, because it is used as key to then query the PRODUCTS subgraph.

Then when PRODUCTS receives this query:

query User__products__1($representations:[_Any!]!) {
  _entities(representations:$representations) {
    ...on UserProduct{
      productId
    }
  }
}

it is only asked for the UserProduct.productId field. This subgraph returns null. The router merges this with the existing data, which does not change it (if a subgraph returning null for an entity overwrote what over subgraphs returned for that entity that would be a whole other mess).

Then the router applies the nullability rules as specified and when it finds

 {
    "__typename": "UserProduct",
    "userId": "1"
}

while it expects

{
  productId
  userId
}

it tries to replace productId with a null, but because it is non nullable, this will nullify the entire userProduct field.

So this is all expected behaviour. The annoying part (and why I just opened https://github.com/apollographql/router/issues/2071 ) is that the Cannot return null for non-nullable field message should not be an error, as nullification is part of a normal operation. But it should still be there in extensions, otherwise people have no way to debug why suddenly half of their response is null

Nov 09 '22 09:11 Geal

Fixed by https://github.com/apollographql/router/issues/2071

Nov 23 '22 15:11 Geal

router router copied to clipboard

"Cannot return null for non-nullable field" when __resolveReference return null parent object

router
router copied to clipboard