graphql-platform icon indicating copy to clipboard operation
graphql-platform copied to clipboard

Overfetching problem when querying related entities and using intermediary projections

Open aradalvand opened this issue 5 years ago • 67 comments

Following https://github.com/ChilliCream/hotchocolate/issues/2365#issuecomment-697727508 I have Book and Author domain classes + BookDto and AuthorDto classes, I want to expose an IQueryable<BookDto> for the books field on my query root type. So here is the GetBooks method on the Query class: It simply projects Book into BookDto and returns an IQueryable<BookDto> so that Hot Chocolate could then do its magic on it.

[UseSelection]
public IQueryable<BookDto> GetBooks([Service]AppDbContext dbContext)
{
    return dbContext.Books.Select(book => new BookDto
    {
        Name = book.Name,
        ...
        Author = new AuthorDto
        {
            Id = book.Author.Id,
            Name = book.Author.Name,
        }
    });
}

Now, when you run the following query:

{
  books {
    name
    author {
      name
    }
  }
}

You would probably expect the following SQL to ultimately get generated:

SELECT [b].[Name], [a].[Name]
      FROM [Books] AS [b]
      INNER JOIN [Authors] AS [a] ON [b].[AuthorId] = [a].[Id]

However, instead, Hot Chocolate causes this one to get generated by EF Core:

SELECT [b].[Name], [a].[Id], [a].[Name], [a].[Age]
      FROM [Books] AS [b]
      INNER JOIN [Authors] AS [a] ON [b].[AuthorId] = [a].[Id]

Which retrieves all the fields on the author (including Age and Id in this case), even though the query requested only the Name.

At first, I thought this must be an EF Core issue, but then I executed the following code:

var queryableDto = dbContext.Books.Select(book => new BookDto
{
    Name = book.Name,
    Author = new AuthorDto
    {
        Id = book.Author.Id,
        Name = book.Author.Name,
        Age = book.Author.Age
    }
});
// The code that you would expect Hot Chocolate will generate for the above GraphQL query:
queryableDto.Select(bDto => new
{
    Name = bDto.Name,
    Author = new
    {
        Name = bDto.Author.Name,
    }
}).ToList();

And I realized it works as expected, and generates the following SQL which only selects the requested column (author name in this case):

SELECT [b].[Name], [a].[Name]
      FROM [Books] AS [b]
      INNER JOIN [Authors] AS [a] ON [b].[AuthorId] = [a].[Id]

Therefore, this must be related to Hot Chocolate, I believe. This is really a serious problem for situations like mine when we don't want to expose IQueryable<TheEntityFrameworkModelClass> but rather a different type.

I'm using Hot Chocolate version 10.5.2 And I've tried this with both EF Core version 3.1.8 and 5.0.0 RC.

aradalvand avatar Sep 24 '20 02:09 aradalvand

i will track this down in the new implementation for 11.

It should just work, because we prjoect like this:

{
  books {
    name
    author {
      name
    }
  }
}
.Select(x=> new BookDto
            {
                Name = x.Name,
                Author = new AuthorDto
                { 
                    Name = x.Author.Name,
                }
            })

PascalSenn avatar Sep 24 '20 06:09 PascalSenn

@PascalSenn Ok, great to hear. Then we'll have to wait for v11 before we can deploy our app so that this issue is fixed until then. Thanks.

It should just work, because we prjoect like this:

Well, as I mentioned here, running the code that you said that Hot Chocolate would generate from the GraphQL query:

queryableDto.Select(bDto => new
{
    Name = bDto.Name,
    Author = new
    {
        Name = bDto.Author.Name,
    }
}).ToList();

would NOT generate the SQL that Hot Chocolate causes to be generated. I talked about this in the original comment and also here. So I think that's probably not exactly how the projection works in Hot Chocolate. And I feel like this might be a little mistake or something somewhere, it's probably not that big of a deal on your side, I don't know, I'm just guessing! But it is, however, a very big deal for our side! Since all the data of a related entity gets fetched unnecessarily when you request even just one column. So, I'd appreciate it if you look into this in the coming days! Thank you so much.

aradalvand avatar Sep 24 '20 14:09 aradalvand

Does this query generate the desired result?

queryableDto.Select(x=> new BookDto
            {
                Name = x.Name,
                Author = new AuthorDto
                { 
                    Name = x.Author.Name,
                }
            }).ToList();

PascalSenn avatar Sep 24 '20 14:09 PascalSenn

@PascalSenn Yes, that query would generate the following SQL:

SELECT [b].[Name], [a].[Name]
      FROM [Books] AS [b]
      INNER JOIN [Authors] AS [a] ON [b].[AuthorId] = [a].[Id]

which is perfect. It only retrieves the name of the book and the name of the author, as requested.

However, sending this GraphQL query to our Hot Chocolate server:

{
  books {
    name
    author {
      name
    }
  }
}

would generate this SQL instead:

SELECT [b].[Name], [a].[Id], [a].[Name], [a].[Age]
      FROM [Books] AS [b]
      INNER JOIN [Authors] AS [a] ON [b].[AuthorId] = [a].[Id]

which as you can see is unnecessarily fetching the Id and Age of the author, while they were not queried.

I can also create a public GitHub repo and push this project into it so you can clone it and look into it and into the generated queries yourself. A super-simple project with less than 15 classes in total.

aradalvand avatar Sep 24 '20 14:09 aradalvand

Yes a repro would be great

Are there any custom resolvers on author?

PascalSenn avatar Sep 24 '20 14:09 PascalSenn

@PascalSenn Sure! I'll do it and let you know. And no, nothing, it's a just class with a handful of properties. I've added nothing HotChocolate-specific to it at all. I'll create the repo shortly ;)

aradalvand avatar Sep 24 '20 14:09 aradalvand

@PascalSenn Here you go, sir: https://github.com/AradAral/AspNetCoreGraphQL

Edit: I added some seed data for the database so if you want to clone the project you can just adjust the connection string and then run one initial migration, as you know, and then you can use the GraphQL endpoint and inspect the generated SQL queries. I also added a MyController, which manually does the querying; you can check the query that that generates too. Thanks in advance.

aradalvand avatar Sep 24 '20 15:09 aradalvand

@PascalSenn: :)

Untitled Project

aradalvand avatar Sep 24 '20 16:09 aradalvand

@AradAral cool, the repro is definitely helpful. Checked the configuration didn't see anything obvious wrong with it.

PascalSenn avatar Sep 24 '20 16:09 PascalSenn

@PascalSenn I just noticed something else which I think might be important: I removed the UseSelection middleware from the GetBooks method, but it still gave me the author data if I queried it, which normally wouldn't happen unless you use the UseSelection middleware. Do you think that can tell us anything about the issue?

aradalvand avatar Sep 24 '20 19:09 aradalvand

Did you also remove it from the resolver declared in the QueryType?

It is probably always requested in this case, because Select statement that maps from object to dto projects it already

PascalSenn avatar Sep 24 '20 19:09 PascalSenn

Yes, I actually removed the whole QueryType class and it's now just the Query class, without the UseSelection attribute on top of the resolver. Now that the UseSelection middleware is not present irrespective of what field I query everything (including the author data) is retrieved.

aradalvand avatar Sep 24 '20 19:09 aradalvand

@PascalSenn I think I just found out what's causing the problem: By building an expression visitor and inspecting the expression that Hot Chocolate passes to the Select method, I just realized that it (the expression Hot Chocolate generates) looks like this:

e => new BookDto()
    {
        Name = e.Name,
        Author = e.Author == null ? default(AuthorDto) : new AuthorDto() { Name = e.Author.Name }
    }

The important bit is where it's checking whether Author is null or not (e.Author == null ? ...), that, as it turns out, is exactly what's causing the whole author object to be fetched from the database. This works properly with EF Core domain classes, in other words, if e.Author was an entity class, but not with DTO classes like this.

So, now, what you do think can be done about this? Does Hot Chocolate allow us to disable null checks so that it leaves it up to me to check for nulls, or perhaps do the null check differently so as to avoid this, or really anything that could solve this problem somehow? Thank you in advance!

aradalvand avatar Sep 26 '20 00:09 aradalvand

Based on the analysis above, I'll close this issue. For anyone curious, I eventually ended up writing a custom expression visitor that replaces any occurrence of e.Author == null that Hot Chocolate generates to e.Author.Id == null, so as to avoid the aforementioned over-fetching problem.

aradalvand avatar Nov 10 '20 18:11 aradalvand

@PascalSenn Hi! I didn't know that, what change did you make to fix this? I thought there was nothing you could do about it.

aradalvand avatar Nov 10 '20 18:11 aradalvand

Sorry i messed something up. This could still be an issue

PascalSenn avatar Nov 10 '20 18:11 PascalSenn

we just fixed UseProjections to do this: Author = e.Author == null ? default(AuthorDto) : new AuthorDto() { Name = e.Author.Name }

I need to check what it does with reprojection

PascalSenn avatar Nov 10 '20 18:11 PascalSenn

@PascalSenn But isn't Author = e.Author == null ? default(AuthorDto)... what HC already did in the previous versions? What has changed exactly?

aradalvand avatar Nov 10 '20 18:11 aradalvand

Yes, but UseProjection did something different. It returned null values. I was thinking this issue was related to that, then i read more into it :) sorry for the messup

PascalSenn avatar Nov 10 '20 18:11 PascalSenn

@PascalSenn Aha! Okay, got it! No problem ;)

aradalvand avatar Nov 10 '20 18:11 aradalvand

It looks like there's still an issue on deeply nested entities.

[UseProjection]
public IQueryable<BookDto> GetBooks([Service]AppDbContext dbContext)
{
    return dbContext.Books.Select(book => new BookDto
    {
        Name = book.Name,
        ...
        Author = new AuthorDto
        {
            Id = book.Author.Id,
            Name = book.Author.Name,
            Address = new AddressDto
            {
                Street = book.Author.Address.Street
            }
        }
    });
}

where the Address is still joined when we only query like so

{
  books {
    name
    author {
      name
    }
  }
}

gojanpaolo avatar Jan 06 '21 06:01 gojanpaolo

@gojanpaolo Hi! It's actually not just about deeply nested objects. Whenever you have something like [Something] = new [Something]Dto { } in your projection that you then return in your resolvers, even one field getting requested by the GraphQL client on that [Something]Dto causes the whole object to be retrieved from the database. I explained why that happens here.

The reason why in your case Address is being retrieved is because, again, you're querying Author.Name and consequently the whole Author object is getting retrieved, which means everything inside of it is getting retrieved, including every nested object, like Address in your case.

aradalvand avatar Jan 06 '21 11:01 aradalvand

I was expecting the projection to behave similar to

var dto = 
   dbContext.Books.Select(book => new BookDto
    {
        Name = book.Name,
        ...
        Author = new AuthorDto
        {
            Id = book.Author.Id,
            Name = book.Author.Name,
            Address = new AddressDto
            {
                Street = book.Author.Address.Street
            }
        }
    });

var result =
    dto.Select(bookDto => new
    {
        Name = bookDto.Name,
        Author = new
        {
            Name = bookDto.Author.Name
        }
    })
    .ToList();

which EF correctly translates to the SQL query I expected, i.e. no join on the Address.

I believe this is something we can fix either in hotchocolate/UseProjection itself, or in my graphql layer code.

gojanpaolo avatar Jan 06 '21 15:01 gojanpaolo

Im also experiencing overfetching:

the query

users { id }

works fine and produces the following SQL

SELECT [u].[Id]
FROM [User] AS [u]

whereas this query

users {
  id
  author { name }
}

unecessary loads more data:

SELECT [u].[Id], [a].[Id], [a].[Name], [a].[UserId], [b].[Id], [b].[AuthorId], [b].[Title]
FROM [User] AS [u]
LEFT JOIN [Author] AS [a] ON [u].[Id] = [a].[UserId]
LEFT JOIN [Book] AS [b] ON [a].[Id] = [b].[AuthorId]
ORDER BY [u].[Id], [a].[Id], [b].[Id]
[UseProjection]
public IQueryable<User> Users([Service] DataContext ctx)
{
     return ctx.User
         .Include(v => v.Author)
         .ThenInclude(v => v.Books);
}

Using efcore 5 and hot chocolate 11

jannikbuschke avatar Apr 20 '21 21:04 jannikbuschke

I see this issue was closed alongside a load of AutoMapper tickets, but this doesn't seem related to AutoMapper? I'm having the same issue here using projections, where all fields are requested in the SQL query, despite only requesting one field of a sub-object

brickscrap avatar Jan 31 '22 17:01 brickscrap

@brickscrap can you share your resolver and query?

PascalSenn avatar Jan 31 '22 17:01 PascalSenn

Ignore me - despite trying to figure it out for hours before commenting here, 10 minutes after posting I found the problem. I had UseProjection enabled on the parent object, and this seemed to be causing issues. Removed it, and it's all working as expected now.

Sorry for bothering you

brickscrap avatar Jan 31 '22 21:01 brickscrap

I am getting the same issue where it is fetching all the columns in child instead of what is selected.

query (im using serial execution)

public class Query
{
    [UseProjection]
    public IQueryable<PersonDto> GetPersons([Service] SomeDbContext dbContext)
    {
        return dbContext.Persons.Select(person => new PersonDto
        {
            Id = person.Id,
            FirstName = person.FirstName,
            LastName = person.LastName,
            Guardian = person.Guardian != null ? new GuardianDto
            {
                Id = person.Guardian.Id,
                FirstName = person.Guardian.FirstName,
                LastName = person.Guardian.LastName
            } : default
        });
    }
}

dtos (objects)

public class PersonDto
{
    public int Id { get; set; }
    public string FirstName { get; set; } = null!;
    public string LastName { get; set; } = null!;
    public GuardianDto? Guardian { get; set; } = null!;
}

public class GuardianDto
{
    public int Id { get; set; }
    public string FirstName { get; set; } = null!;
    public string LastName { get; set; } = null!;
}

selections

query {
  persons {
    id
    firstName
    guardian {
      id
      firstName
    }
  }
}

sql

SELECT "p"."Id", "p"."FirstName", "g"."Id" IS NOT NULL, "g"."Id", "g"."FirstName", "g"."LastName"
      FROM "Persons" AS "p"
      LEFT JOIN "Guardians" AS "g" ON "p"."GuardianId" = "g"."Id"

here's a repo if you need the fullest context: https://github.com/nollidnosnhoj/testchocolate

EDIT: another thing interesting is when i did this with a collection of children dtos, it projected correctly.

nollidnosnhoj avatar Feb 08 '22 04:02 nollidnosnhoj

For anyone still having this problem, I explained the reason why it's happening in this comment.

I think since this pattern of having "intermediary projections" (like DTOs) is rather common when using Hot Chocolate along with Entity Framework, it would make sense if Hot Chocolate provided a configuration setting or something on its projection engine to allow the developer to choose how the existence or non-existence of a single related entity is determined in the projection.

aradalvand avatar Feb 08 '22 10:02 aradalvand

So let me reopen this issue:

aradalvand avatar Feb 08 '22 10:02 aradalvand