apollo-datasource-http icon indicating copy to clipboard operation
apollo-datasource-http copied to clipboard

Request deduplication (memoization) does not occur across concurrently executing frames

Open bmowis opened this issue 2 years ago • 3 comments

This may be more of a feature request than a bug, as 'memoization' is admittedly currently 'working as designed'.

However, my understanding of memoization is that multiple independent tasks should be allowed to share a single common method invocation and its corresponding promised result. So for example, if 6 threads in an execution frame call into a function with the same expectation for results (e.g. Same provided parameters) they will all get the shared result without the need to execute the function 6 times.

However, it looks as though the HTTPDataSource.performRequest() method is not memoizing the Promise until after the result is already received. This means that concurrent, memoized access to an HTTPDataSource API method will still result in multiple simultaneous and redundant outbound calls to the HTTP endpoint. If one of the outstanding goals of memoization is request de-duplication, than this should not be so.

~Line 313 of HTTPDataSource.ts: const responseData = await this.pool.request(requestOptions)

Then, ~ line 336:

      if (this.isRequestMemoizable(request)) {
        this.memoizedResults.set(cacheKey, response)
      }

In Apollo Server, this limitation is easily observable in the FIRST query returning a requested entity when multiple fields in the same schema object explicitly use the same resolver (and subsequently, the same DataSource API method).

// Query
query getMovie {
  movie(id: 12) {
    name,
    director,
    year
  }
}


// Schema
type Query {
  movie(id: Int!): [Movie]
}

type Movie {
  name: String
  director: String
  year: Int
}


// Resolver
// For this resolver, assume that the getMovie(id, ctx) method uses an HTTPDataSource implementation
// to fetch a single Movie object
const resolver: Resolvers = {
  Query: {
    movie: (root, { id }, ctx): => ({ id })
  },
  Movie: {
    name: async ({ id }, args, ctx) => {
      const { name } = await getMovie(id, ctx);
      return name;
    },
    directory: async ({ id }, args, ctx) => {
      const { director } = await getMovie(id, ctx);
      return director;
    },
    year: async ({ id }, args, ctx) => {
      const { year } = await getMovie(id, ctx);
      return year;
    }
  }

In the above circumstance, we have proper, independent field-level resolvers. However, each field resolver here will in fact invoke their own outbound HTTP call (with the same input parameters and expected result), missing both the memoized results LRU map and the cache, despite the fact that multiple calls are duplicitous and unnecessary.

If feasible, sharing function invocations - not just function results - would improve memoization 'hits' in concurrent-access scenarios. I believe a working approach to this could be to memoize the Promised result before it is actually received, rather than just the finished result itself. Like this:

    const responsePromise = await this.pool.request(requestOptions)
    if (this.isRequestMemoizable(request)) {
        this.memoizedResults.set(cacheKey, responsePromise)
    }

    // Now we can await the result and finish the rest of performRequest() that requires an actual result.
    const response = await responsePromise;

If further explanation or an example is required, let me know.

bmowis avatar Feb 10 '22 14:02 bmowis

For the record, I am currently working around this issue in the above example by calling the getMovie() method in the parent/root resolver (child fields need to wait for parent fields to resolve to scalars prior to execution, so children will have access to the parent's memoized result by the time they attempt their requests).

However, this solution seems bloaty and may not be possible in more complex modelling/querying scenarios. For instance, two separate root-level queries may be executing and accessing the same HTTPDataSource API, and they should be able to leverage a memoized promise so that there is only a single outbound HTTP call between the two queries.

bmowis avatar Feb 10 '22 15:02 bmowis

I think we should use DataLoader for batching and caching in a single request. I am writing something like this as a workaround. Maybe I'll make a PR for this later if I have time.

class Example extends HTTPDataSource {
  private memoizeLoaderCache = new Map<string, DataLoader<string, unknown>>()

  protected async memoizedGet<T = unknown>(
    path: string,
    query: unknown = {},
  ): Promise<T> {
    // generate a unique key for each request like onCacheKeyCalculation
    const cacheKey = path + ' ' + JSON.stringify(query)

    // check if the data loader exists
    let loader = this.memoizeLoaderCache.get(cacheKey) as DataLoader<string, T>
    if (!loader) {
      // create a data loader for the cache key
      loader = new DataLoader<string, T>(async (keys: readonly string[]) => {
        const { body } = await this.get<T>(path, { query })
        return keys.map((_) => body)
      })
      this.memoizeLoaderCache.set(cacheKey, loader)
    }

    // register your request
    const res = await loader.load(cacheKey)
    return res
  }
}

llc1123 avatar Mar 22 '22 17:03 llc1123

https://github.com/StarpTech/apollo-datasource-http/pull/37 addresses this. #37

bradleymorris avatar Jun 30 '22 15:06 bradleymorris