type-graphql icon indicating copy to clipboard operation
type-graphql copied to clipboard

Feature: Data loader integration

Open MichalLytek opened this issue 6 years ago • 45 comments

https://github.com/facebook/dataloader

  1. Caching resolving the same data
  2. Instead of normal resolving:
@FieldResolver()
author(@Root() recipe: Recipe) {
  return this.userRepository.findById(recipe.authorId);
}

Introduce batched resolving - taking array of roots:

@FieldResolver()
async author(@Root() recipes: Recipe[]) {
  const ids = recipes.map(r => r.authorId);
  return this.userRepository.findByIds(ids);
}
  1. Decorator:
@Mutation()
async mutationThatAffectLoader(
  @Arg("input") input: SomeInput,
  @DataLoader(User) loader,
) {
  const result = await this.mutateSomething(input);
  loader.clear(input.someKey);
  return result;
}

MichalLytek avatar Mar 27 '18 19:03 MichalLytek

Looking forwards to it on solving n+1 requests.

laukaichung avatar Apr 29 '18 06:04 laukaichung

I'm not sure how it's supposed to work.

Does @FieldResolver() automatically cache the returned result with dataloader?

@FieldResolver()
author(@Root() recipe: Recipe) {
  return this.userRepository.findById(recipe.authorId);
}

It looks like I don't need to do anything at all that involves dataloader. If it's true, I'll wait for the integration instead of starting to use dataloader now.

laukaichung avatar Apr 30 '18 03:04 laukaichung

@Resolver(() => Post)
class PostResolver {
  @Query(returns => [Post])
  posts() {
    return this.manager.find(Post);
  }

  @FieldResolver()
  comments(@Root() posts: Post[]) {
    const postIds = posts.map(post => post.id);
    const comments = this.manager.find(Comment, { postId: In(postIds) });
    // grouping and providing correct order for dataloader
    return posts.map(post =>
      comments.filter(comment => comment.postId === post.id)
    );
  }
}

So executing this query:

query {
  posts {
    id
    title
    comments {
      text
    }
  }
}

will call DB two times - one for fetching posts collection, one for fetching comments collection, because comments resolver will be batched. And yes, field resolvers will be cached, so calling the field resolver with the same parent and args data will not execute it again.

MichalLytek avatar Apr 30 '18 07:04 MichalLytek

This is great. It looks like the dataloader is working under the hood in @FieldResolver. I don't even need to call up @DataLoader(Comment) and explicitly cache the fetched comments in @FieldResolver.

laukaichung avatar Apr 30 '18 08:04 laukaichung

When I've used DataLoader before, I've generally used it at the repository layer, rather than on resolvers. The main benefit of this being that it doesn't leak logic around batching into the service layer (requiring everything in the app to work on n ids rather than 1).

The main downside is that it requires me to construct services & repositories per-request. Which I can't really figure out how to do with TypeGraphQL — would require the resolvers & DI context to be constructed per request.

What do you think about solving this problem by allowing this, rather than bundling dataloader support into the resolver?

chrisdevereux avatar May 25 '18 12:05 chrisdevereux

It's easy to solve - create a global middleware that will create new dataloaders and attach them to context. Then you can use the loaders from context in resolver class methods.

const DataLoader: MiddlewareFn = ({context}, next) => {
  if (!context.isDataLoaderAttached) {
    context.isDataLoaderAttached = true;

    context.personLoader = new DataLoader(/* ... */);
    // ...
  }
  return next();
}

Of course you would need to provide new context object in express-graphql or graphql-yoga.

I'm not a fan of creating DI Container for each "request" as it denies the idea of single-threaded Node.js.

MichalLytek avatar May 25 '18 13:05 MichalLytek

Is anyone aware of a workable strategy to get dataloader integrated with type-graphql today?

ashleyw avatar Jul 24 '18 15:07 ashleyw

@ashleyw you can always fallback to the plain apollo-like resolvers strategy - create dataloaders for your app for each request (like in example above - new DataLoader) and then retrieve them using @Ctx decorator for use in your resolvers.

MichalLytek avatar Jul 24 '18 15:07 MichalLytek

@19majkel94 thanks, I'll look into it!

Is there anything you've discovered that prevents the batched resolving you posted above? That seems like it would be a sweet, almost automatic solution!

ashleyw avatar Jul 24 '18 15:07 ashleyw

@ashleyw This is how it works in @pleerock's vesper and it's quite easy to ~copy~ reimplement it in TypeGraphQL. But I am really busy right now, so bugfixing and maintance is all I can do right now 😞

MichalLytek avatar Jul 24 '18 15:07 MichalLytek

This should get you started. Create a DataLoaderMiddleware:

import DataLoader = require('dataloader');
import { MiddlewareInterface, NextFn, ResolverData } from 'type-graphql';
import { Service } from 'typedi';

import { Context } from './context.interface';

@Service()
export class DataLoaderMiddleware implements MiddlewareInterface<Context> {
  async use({ root, args, context, info }: ResolverData<Context>, next: NextFn) {
    if (!context.dataLoader.initialized) {
      context.dataLoader = {
        initialized: true,
        loaders: {}
      };

      const loaders = context.dataLoader.loaders!;

      context.connection.entityMetadatas.forEach(entityMetadata => {
        const resolverName = entityMetadata.targetName;
        if (!resolverName) {
          return;
        }

        if (!loaders[resolverName]) {
          loaders[resolverName] = {};
        }

        entityMetadata.relations.forEach(relation => {
          // define data loader for this method if it was not defined yet
          if (!loaders[resolverName][relation.propertyName]) {
            loaders[resolverName][relation.propertyName] = new DataLoader((entities: any[]) => {
              return context.connection.relationIdLoader
                .loadManyToManyRelationIdsAndGroup(relation, entities)
                .then(groups => groups.map(group => group.related));
            });
          }
        });
      });
    }
    return next();
  }
}

Then attach when building your schema:

  const schema = await buildSchema({
    globalMiddlewares: [DataLoaderMiddleware],
    resolvers: [__dirname + '/**/*.resolver.ts']
  });

Then you can use in your FieldResolvers like so:

  @FieldResolver()
  photos(@Root() user: User, @Ctx() ctx: Context): Promise<Photo[]> {
    return ctx.dataLoader.loaders.User.photos.load(user);
  }

This still requires a bunch of boilerplate. I'd love to figure out a way to automatically create these field resolvers while also getting the schema generated and type safety.

goldcaddy77 avatar Jul 27 '18 22:07 goldcaddy77

@19majkel94 Hey, I would like to ask if the current approach with FieldResolver() for batching is working or not. Im trying to use the FieldResolver for batching but its not working. More in description in my code

import { Args, FieldResolver, Query, Resolver, Root } from "type-graphql";
import { Page } from "../../entity/Page/Page";
import { PageArgs } from "./PageArgs";
import { EntityManager, getManager } from "typeorm";
import { PagePart } from "../../entity/PagePart/PagePart";

@Resolver(() => Page)
class PageResolver {
  private manager: EntityManager;
  constructor() {
    this.manager = getManager();
  }

  // This query is called once and receive all pages from DB in array
  // Each page has some parts which are fetched separately using FieldResolver.
  @Query(() => [Page])
  pageAll(@Args() args: PageArgs): Promise<Page[]> {
    return this.manager.find(Page);
  }

  @FieldResolver(() => PagePart, { nullable: true })
  async pageParts(@Root() pages: Page[]) {
    // pages arg should contain an array of all fetched pages from pageAll()
    // BUT instead im receiving in pages arg a single page as object
    // Means: If I fetch 50 pages, the FieldResolver is called 50 times
    // receiving each page separately as object instead of array of 50 items.

    console.log("Field resolver called");
    return null;
  }
}

export default PageResolver;

Query example:

{
  pageAll {
    pageParts {
      id,
      name,
      content
    }
  }
}

Any help of what im doing wrong, please? Thanks!

miroslavzeman avatar Sep 11 '18 07:09 miroslavzeman

@miro4994 It's an issue, "To Do in Board", so how this is supposed to work? Is it a feature documented in docs? 😄 https://19majkel94.github.io/type-graphql/

MichalLytek avatar Sep 11 '18 08:09 MichalLytek

Well, its morning... My mistake :D Thanks for quick response.

miroslavzeman avatar Sep 11 '18 08:09 miroslavzeman

I've recently created BatchLoader written in TypeScript. Compared to DataLoader, it only does batching, but does it better. It shares identical apis (load, loadMany) with DataLoader. It handles duplicate keys (further reduces round trips when duplicate ids are provided) and is composable to create a new loader via mapping function. It doesn't do any caching, but can easily be intergated with any cache library. You may want to consider it over DataLoader. :)

joonhocho avatar Oct 05 '18 10:10 joonhocho

it only does batching, but does it better.

What does it means "better"? Can you show an example what is better, or maybe show some benchmarks with comparison to the dataloader?

It doesn't do any caching, but can easily be intergated with any cache library.

I think that Facebook's DataLoader is a well established solution that is known to every graphql developer, so it will be the first choose for now as it has "battery" included. Maybe in the future I will try to expose the data loading API to plugins so library consumers could choose their own baching or caching library.

MichalLytek avatar Oct 07 '18 14:10 MichalLytek

One of the issue of dataloader, is that you have to create a loader per field. So if in your query you have 2 fields that are both related to the same entity, you will have to load them 2 times. For example, you have a table game with a column player1 and another column player2, that both are related to another table user, you will still need 2 loader, one for player1 and one for player2. Would be cool to only have one loader user. Also another weird thing with dataloader is that it use setTimeout to trigger the loading of the data. It would be great to get rid of this and maybe find a solution where graphql library would trigger the loader, maybe with an event or a callback function.

But, I guess to start, using dataloader might be the best approch ;-)

apiel avatar Oct 07 '18 17:10 apiel

@19majkel94

First, batchloader takes care of duplicate keys. For example, for the following code

loader.load(1),
loader.load(2),
loader.load(1),

with BatchLoader, your batch function will get [1, 2] as keys without duplicate keys, while DataLoader gives you [1,2,1]. There is a bit of optimization to remove duplicate keys.

As @apiel noted, with BatchLoader you can map a loader to create another loader to get specific fields or chain another query.

usernameLoader = userLoader.mapLoader((user) => user && user.username);
pictureLoader = userLoader.mapLoader((user) => user && user.picture);

usernameLoader.load(userId)
pictureLoader.load(userId)

both loaders will be sharing the same userLoader's batch queue so there are only one call to userLoader's batch function with keys being [userId].

You can compose a new loader by chaining loaders as well:

chainedLoader = loader1.mapLoader(v => loader2.load(v)).mapLoader(v => ...)

@apiel, We also use setTimeout to create batch queue. I don't see why it's a problem tho. Otherwise, you will need to somehow manage the queue yourself. Do you have any reason that you don't like setTimeout? do you have a better approach or use case?

joonhocho avatar Oct 07 '18 23:10 joonhocho

with BatchLoader, your batch function will get [1, 2] as keys without duplicate keys, while DataLoader gives you [1,2,1]. There is a bit of optimization to remove duplicate keys.

But we are talking about the cached version:

However, when the memoization cache is disabled, your batch function will receive an array of keys which may contain duplicates!

https://github.com/facebook/dataloader#disabling-cache

with BatchLoader you can map a loader to create another loader to get specific fields or chain another query

So can you show me that in this example? How many DB calls to the tables will DataLoader produces and the BatchLoader? 😉

query {
  user(id: 12345) {
	name
	posts(last: 5) {
     title
     comments(last: 3) {
       content 
     }
  }
}

MichalLytek avatar Oct 08 '18 08:10 MichalLytek

I was trying to use custom Repositories from typeorm with partial success. I've done something like this and batching is working fine:

import { EntityRepository, Repository } from "typeorm";
import * as DataLoader from "dataloader";

import { User } from "../types/User";

@EntityRepository(User)
export class UserRepository extends Repository<User> {
  private loader: DataLoader<number, User> = new DataLoader(ids => {
    return this.findByIds(ids);
  });

  findById(id: number) {
    return this.loader.load(id);
  }
}

and later in the UserResolver:

import { Arg, Int, Query } from "type-graphql";
import { InjectRepository } from "typeorm-typedi-extensions";

import { User } from "../types/User";

import { UserRepository } from "../repositories/UserRepository";

export class UserResolver {
  constructor(
    @InjectRepository(User) private readonly userRepository: UserRepository,
  ) {}

  @Query(returns => User, { nullable: true })
  user(@Arg("id", type => Int) id: number) {
    return this.userRepository.findById(id); // It's using DataLoader
  }
}

So in the query like this:

query {
  user1: user(id: 1) {
    fullName
  }
  user2: user(id: 2) {
    fullName
  }
}

DataLoader will receive array of IDs [1, 2];

However, the @InjectRepository decorator will create a new UserRepository instance for each decorator invocation. So in the case like this one it will fail to reuse data loader and caching will not work:

query {
  user1: user(id: 1) {
    fullName
    posts {
      title
      user {
        fullName
      }
    }
  }
  user2: user(id: 2) {
    fullName
  }
}

as there will be two instances of repositories, one for the UserResolver and one for the PostResolver class.

So the DataLoader in the UserResolver class will receive IDs [1, 2] and later the PostResolver DataLoader will receive ID [1] and another database query will have to be made.

Do you know if there is a way of reusing injected repository? I'm new to all the DI stuff in TypeScript. Or I have to use the context object to store data loaders?

lukejagodzinski avatar Oct 17 '18 22:10 lukejagodzinski

Do you know if there is a way of reusing injected repository? I'm new to all the DI stuff in TypeScript. Or I have to use the context object to store data loaders?

Of course you can store dataloader in context. You can also use scoped containers feature: https://19majkel94.github.io/type-graphql/docs/dependency-injection.html#scoped-containers

I was trying to use custom Repositories from typeorm with partial success. I've done something like this and batching is working fine:

You can use typeorm-loader: https://github.com/Webtomizer/typeorm-loader

MichalLytek avatar Oct 18 '18 09:10 MichalLytek

@19majkel94 thanks for pointing me into right direction. Actually, I've used services and put loaders there without using the typeorm-loader library. DataLoaders are quite simple, so didn't want to use any wrappers. Incorporating loaders into entities maybe sounds like good simplification but for me it just hides logic and I prefer to have more control over the process.

The only thing I will probably have to do is to clear container after each request, however that might not be necessary in my case as we will probably end up using serverless or similar lambda function provider.

A few questions about your example from the first post with the @FieldResolver(). I assume that it's just an example and not actual syntax to be used. I think it would be a little bit confusing to resolve multiple items when resolving single field, don't you think? Do you have any update on syntax side? To be honest I think such things doesn't have to be integrated into library. Maybe some example in the repo would be nice but that's it. At least, it's what I think from the current perspective. Maybe you have some great idea for how it could work.

lukejagodzinski avatar Oct 18 '18 22:10 lukejagodzinski

I assume that it's just an example and not actual syntax to be used. I think it would be a little bit confusing to resolve multiple items when resolving single field, don't you think? Do you have any update on syntax side?

No, it's an actual syntax, exactly how it's working in mentioned earlier Vesper: http://vesper-framework.com/#/typescript/resolvers

You can resolve a field for single root (normal mode) or you can resolve a field for multiple roots (batch mode). It's easy, why do you think it's confusing? Do you have a better API proposal?

To be honest I think such things doesn't have to be integrated into library. Maybe some example in the repo would be nice but that's it.

TypeGraphQL design goal is to provide not only class-to-schema feature but also to be a little frameworkish, that's why you have e.g. authorization built-in. I can make this feature really opt-in by a plugin (if the dataloader weight is too big) but in reality batching and caching is a must have for a GraphQL API server.

MichalLytek avatar Oct 19 '18 15:10 MichalLytek

You can resolve a field for single root (normal mode) or you can resolve a field for multiple roots (batch mode). It's easy, why do you think it's confusing? Do you have a better API proposal?

I'm not saying it's bad API but it's just not clearly visible that this resolver is using data loader. I guess that internally you can use reflections to detect if developer provided array or single object and use loader or not. But I think it would be better to explicitly decorate resolver.

TypeGraphQL design goal is to provide not only class-to-schema feature but also to be a little frameworkish, that's why you have e.g. authorization built-in. I can make this feature really opt-in by a plugin (if the dataloader weight is too big) but in reality batching and caching is a must have for a GraphQL API server.

Yep I agree that batching and caching is a must have in a big real life application. I'm not arguing with this :). Would be great to have it built in but it would also be nice to have a little bit of control over how it works.

I'm just curious if in a given example below:

@Resolver(of => Post)
export class PostResolver {
  @FieldResolver()
  user(@Root() post: Post) {
    return this.manager.findOne(User, post.userId);
  }
}

@Resolver(of => User)
export class UserResolver {
  @Query(returns => User, { nullable: true })
  user(@Arg("id", type => Int) id: number) {
    return this.manager.findOne(User, id);
  }

  @FieldResolver()
  posts(@Root() user: User) {
    return this.manager.find(Post, { userId: user.id });
  }
}
query {
  user(id: 1) {
    posts {
      user { # ID = 1
        fullName
      }
    }
  }
}

would it reuse DataLoader for both UserResolver.user query and for PostResolver.user field resolver?

lukejagodzinski avatar Oct 19 '18 20:10 lukejagodzinski

would it reuse DataLoader for both UserResolver.user query and for PostResolver.user field resolver?

No, because dataloaders for field resolvers are tightly coupled to roots and the field resolver args.

And more obvious, as DataLoader works on setTimeout, the user query will return a promise, then for selected user it will query db for user's posts, and then during resolving of post object it will batch the queries for user field.

If you need a more aggressive caching, you can use your own data loaders that works on db layer. Dataloaders on field resolvers has to be universal - it may batch and cache some computation (like averageRating) or HTTP calls, not only DB queries using TypeORM.

Would be great to have it built in but it would also be nice to have a little bit of control over how it works.

Built-in will be universal and easy to use. They will fit for common cases and regular users. If you need more control - disable it and use your own solution. Complicating the API for the advanced 10% of users will make the 90% users sad 😞

Let's just wait for a first version of this feature and then we can find weak spots and discuss about making it more powerful. Now it's just like writing with a stick on the sand 😄

MichalLytek avatar Oct 19 '18 20:10 MichalLytek

No, because dataloaders for field resolvers are tightly coupled to roots and the field resolver args.

That was what I was thinking. It's rather rare case to have such a relation as I presented in the example but as you described for such cases it's better to use custom solution.

Let's just wait for a first version of this feature and then we can find weak spots and discuss about making it more powerful. Now it's just like writing with a stick on the sand

Agree :). When do you plan to have some working version? Could probably help you with some development but would need to spend some time on understanding internals of the library.

SIDE NOTE: (probably shouldn't talk about it here) I see that Vesper looks kinda like your competition but I prefer defining entities and types in the same class (so point for you :) ). I see that for top level queries Vesper has Controllers, and Resolvers only for custom types. I find it a little bit more intuitive, especial for newcomers. So I think it would be nice to have better explanation in docs how typegraphql's API correlates with standard GraphQL one. Just at the beginning it wasn't super intuitive for me, but as I've started using it, it makes much more sense now. Maybe I could help you with some documentation as I keep using it more. I'm just talking from the perspective when I was creating documentation for my library. Sometimes when I read it now, after several years, I don't understand it :). So sometimes, it's good to have documentation being written by someone else or at least being updated by a person who is not an author of the library and don't have full knowledge about why given design decision have been made :)

lukejagodzinski avatar Oct 19 '18 21:10 lukejagodzinski

When do you plan to have some working version?

I have to reorganize the milestones. Release 1.0.0 will be focused only on bug fixing, error reporting, making proper documentation and breaking changes in API that will prepare TypeGraphQL for new features.

Releases after 1.0 will be focused on new features like dataloader - depending on breaking changes, they will be released at 1.x (latest) or 2.0-beta.x (next).

SIDE NOTE (...)

Let's move this offtopic to the separate issue 😉

MichalLytek avatar Oct 20 '18 12:10 MichalLytek

I just learnt the hard way that dataloader instance should not be shared application-wide. I'm getting similar caching problem as https://github.com/facebook/dataloader/issues/65

I have to create dataloader instance per request and pass it to the Context. I'm looking for alternatives that can be used for batch requests as well as caching results.

laukaichung avatar Nov 06 '18 10:11 laukaichung

+10000

gotexis avatar Jul 14 '19 09:07 gotexis

Hi @MichalLytek,

Was wondering if there is a way to inject the root array(for N+1 problem) for a query when using NestJS, cause I'm getting only the Root object?

Thanks, Gil

GilShik avatar Jan 19 '20 13:01 GilShik