apollo-feature-requests icon indicating copy to clipboard operation
apollo-feature-requests copied to clipboard

Add `context.headers` to be used as the cache key to better support Localisation and Experiments use cases

Open thifike opened this issue 2 years ago • 14 comments

As described in the title, I would like the context.headers variables also be used in calculating the cache key in the Apollo Client to better support at least the following use cases:

  • Localisation - based on language preference or physical location or both
  • Experiments - Gradually roll out feature changes that impact resolver behaviour

Previous, similar issues:

  • https://github.com/apollographql/apollo-client/issues/1420
  • https://github.com/apollographql/apollo-feature-requests/issues/232
  • https://github.com/apollographql/apollo-feature-requests/issues/316

Use cases

Although the previous issues above described some parts of the problem they are missing important information that would allow you to gauge how serious this issue is.

We are operating a big federated GraphQL service with roughly several dozens of GraphQL servers behind the gateway. We use the Graph internally, but we also expose (parts of) it to external customers.

We need to support Localisation by being able to specify Language and Country/Region and we also need to support Experiments where the number of parameters that can be set are practically endless.

The problem

We are using HTTP headers to pass these values to our Graph. When I create an Apollo Client using basic settings with an in-memory cache, the client breaks immediately when we send the exact same query with the exact same parameters but different language, country or experiment values in the headers.

Apollo Client:

const apolloClient = new ApolloClient({
      cache: new InMemoryCache(),
      ...
    });

Query:

{
   headers: {
    "my-language": "en-US",
    "my-country": "US",
    "my-experiment-1728": "true",
    "my-experiment-2833": "8",
  },
  query: gql`
    {
      books(id: "0123456789") {
        title
        type
        publicationYear
      }
    }
  `;
}

Response (fetched from the network):

{
  "data": {
    "books": {
      "title": "War and Peace",
      "type": "Paperback",
      "publicationYear": 2011,
    }
  }
}

And if we send the exact same request but with fr-FR and FR respectively, it is immediately returned from the cache as “War and Peace” instead of going out to the network and fetch “Guerre et Paix” from the server. The only thing we can influence on our end is to turn off caching completely and in that case the client behaves as it should, but to us that’s not an acceptable solution.

Why not just pass these as Query variables?

The first similar issue above (https://github.com/apollographql/apollo-client/issues/1420) contains some recommendations from you that we should avoid this problem by adding these extra parameters as query variables however that is not an option because:

  • The number of variables are practically endless if we add Experiments to the mix
  • This would also result in a schema change when adding a new variable which (depending on implementation details) might break the integration with our external (or even internal) customers

Possible solution

I believe one solution could be to modify QueryManager’s fetchQueryByPolicy<TData, TVars> method (https://github.com/apollographql/apollo-client/blob/main/src/core/QueryManager.ts#L1337) to pass context.headers to queryInfo.getDiff() and then pass those values further down and respecting them in ^StoreReader.diffQueryAgainstStore<T>` (https://github.com/apollographql/apollo-client/blob/8beb4820edc6352996e08f7f73bde3573f1eb666/src/cache/inmemory/readFromStore.ts#L220).

I’m more than happy to send in a pull request for this, but first I would like to understand whether there is a better solution to our problem that I might not be aware of.

Thanks, Béla

thifike avatar Apr 14 '22 10:04 thifike

We have a similar problem in a multi-tenant SaaS. We use the headers to specify which tenant is making the request but we also have internal tooling that has access to multiple tenants. Sadly the tenant information is ignored by the caching, leading to wrongly returned data. The current workaround is to also pass the tenant information as query variables.

Would be interested to hear if there will be an option in the future, to specify certain headers that should be taken into account.

sanjom avatar Apr 22 '22 09:04 sanjom

I concur. Apollo should expose the operation's context in the caching function. Sometimes the operation definition and returned data are not enough to construct a reliable cache key. For example, two users with different permissions may be doing the same operation on the same host, which would result in them sharing the invalid cached state. This is relevant in the context of building applications for multi-user support:

  1. I sign in as UserA, and perform an operation.
  2. The operation result is cached in CacheA.
  3. I sign out, then sign in as UserB. Due to my permissions, the operation result will be different while the operation definition will remain the same.
  4. I perform the same operation.
  5. I get the CacheA, although its results are irrelevant to me (moreover, invalid).

kettanaito avatar Jun 10 '22 12:06 kettanaito

+1

daveykane avatar Jun 15 '22 16:06 daveykane

+1

yasircodingcrafts avatar Oct 07 '22 12:10 yasircodingcrafts

+1

marine-mb avatar Oct 11 '22 12:10 marine-mb

+1

trtin avatar Nov 06 '22 05:11 trtin

+1

duveborg avatar Jan 23 '23 07:01 duveborg

+1

ghost avatar Jul 06 '23 14:07 ghost

+1

brauliolledo avatar Aug 15 '23 20:08 brauliolledo

+1

auirarrazaval avatar Nov 14 '23 04:11 auirarrazaval

@kettanaito If you have multiple users on a server, you should have multiple ApolloClient/InMemoryCache instances, one for each request - and the same also goes for the client. The moment a user logs out, you should either reset the cache you are using, or (even better!) recreate the full Apollo Client.

(Prefacing this next part with: these are my feelings here at this point, we might come to a different conclusion as a team anyways.)

I have to admit that I have very mixed feelings about this feature request in general - Apollo Client is a GraphQL client, and the GraphQL spec doesn't know the concept of headers (apart from the Content-Type and Accept headers in the GraphQLOverHTTP spec).
Other transports could use different mechanisms than headers, and even HTTP itself could just use a credentials: "include" and the contents of the cookies that were sent along to the server would never be known to the client in the first place. Even if we would ignore all these things and nail it down to "we only consider the HTTP case with headers", there often is no good way to distinguish if a change in the headers means "the user just refreshed their access token" and "it's a different user".

As of right now, the best approach to that (in non-auth-cases) would be to add query parameters or send additional variables (if you don't want to add them to the schema, you could step outside of the GraphQL spec at your own risk and set includeUnusedVariables on the client and just ignore them on the server).

In authentication cases, see my reasoning above: please reset your store on logout. You don't want user B going into the devtools and seeing user A's data readily available anyways.


PS: I just realized I answered to a very old reply here - I'm sorry! GitHub hid all the +1 responses and showed your response as the last one @kettanaito 😓

phryneas avatar Nov 14 '23 09:11 phryneas

@phryneas Thank you for the response!

I do have one question regarding the following:

As of right now, the best approach to that (in non-auth-cases) would be to add query parameters or send additional variables (if you don't want to add them to the schema, you could step outside of the GraphQL spec at your own risk and set includeUnusedVariables on the client and just ignore them on the server).

(disclaimer: we are using @vue/apollo-composable, so I'm sorry if this is not an issue with apollo-client)

We have actually tried this, but encountered the following issue. For a specific part of our app, we need to run multiple concurrent queries through apollo, where the only thing that changes is a header (I agree that this is not ideal, but it is what it is).

  • On a first iteration, and because no variables changed between queries, Apollo would execute just once the request to the server, as it considered that all queries were identical. Upon receiving a response, it is propagated to every instance of the query executed. I now understand why this happens, so all good here
  • On a second iteration, we duplicated the (unused) variable in the variables object of the query execution. This meant that all queries were executed to the server. However, upon response, all results mapped to the same cache, as only used variables seem to be considered there. This meant that eventually all results in the client would contain the latest content in the cache. That is, whichever response came in last.
  • Finally, in order to make this work, we had to disable the cache for the specific query using a no-cache fetch Policy. This is actually alright for this specific case-scenario. However, It brought a question to my head:

It seems to me that there's a discrepancy in the way different queries are detected in the cache and upon deduplication of requests. When determining the cache difference, only used variables are considered, but when avoiding duplicated requests, all variables (used and unused) are considered.

Leaving aside the fact that this is an extremely hacky border case, and maybe not even remotely relevant for the general case scenario, shouldn't these two detection methods be the same? After all, they are both determining the same thing: is query A the same as query B. It doesn't seem right to me that the criteria for the same question is different.

auirarrazaval avatar Nov 14 '23 17:11 auirarrazaval

It doesn't seem right to me that the criteria for the same question is different.

This is because a query can have multiple fields:

query MyQuery($var1: String!, $var2: String!) {
  foo(arg: $var1) { something }
  bar(arg: $var2) { something }
}

here, from the perspective of the request, you're sending $var1 and $var2. But from the cache perspective, you have two separate fields, foo that has to be stored for a call with $var1 and bar that has to be stored for a call with $var2.

phryneas avatar Nov 15 '23 11:11 phryneas

@phryneas

As of right now, the best approach to that (in non-auth-cases) would be to add query parameters or send additional variables (if you don't want to add them to the schema, you could step outside of the GraphQL spec at your own risk and set includeUnusedVariables on the client and just ignore them on the server).

Is includeUnusedVariable documented anywhere? I can't seem to find anything on it

gsimon2 avatar Jan 17 '24 20:01 gsimon2