efcore icon indicating copy to clipboard operation
efcore copied to clipboard

Query only Base classes in TPT

Open Wheelnius opened this issue 2 years ago • 5 comments

I am looking for ways to query only Base classes in a TPT relationship, in EF Core 7.

With my example:

Entities Definition:

public record BaseClass(Guid Id, string Text);
public record DerivedSecondClass(Guid Id, string Text, string Name) : BaseClass(Id, Text);
public record DerivedThirdClass(Guid Id, string Text, string Name) : BaseClass(Id, Text);

Configuration:

public class BaseClassConfiguration : IEntityTypeConfiguration<BaseClass>
{
    public void Configure(EntityTypeBuilder<BaseClass> builder)
    {
        builder.ToTable("BaseClasses", "Default");
    }
}

public class DerivedSecondClassConfiguration : IEntityTypeConfiguration<DerivedSecondClass>
{
    public void Configure(EntityTypeBuilder<DerivedSecondClass> builder)
    {
        builder.ToTable("DerivedSecondClasses", "Default");
    }
}

public class DerivedThirdClassConfiguration : IEntityTypeConfiguration<DerivedThirdClass>
{
    public void Configure(EntityTypeBuilder<DerivedThirdClass> builder)
    {
        builder.ToTable("DerivedThirdClasses", "Default");
    }
}

Current standard behaviour is this:

Interaction with DB:

var baseClasses = await _context
	.Set<BaseClass>()
	.Where(b => b.Text == "ABC")
	.ToListAsync();

Generated SQL:

SELECT TOP(1) [b].[Id], [b].[Text], [a].[Name], [a0].[Name], CASE
    WHEN [a0].[Id] IS NOT NULL THEN N'DerivedSecondClass'
    WHEN [a].[Id] IS NOT NULL THEN N'DerivedThirdClass'
END AS [Discriminator]
FROM [Default].[BaseClasses] AS [b]
LEFT JOIN [Default].[DerivedSecondClasses] AS [a] ON [b].[Id] = [a].[Id]
LEFT JOIN [Default].[DerivedThirdClasses] AS [a0] ON [b].[Id] = [a0].[Id]
WHERE [b].[Text] = N'ABC'

So, what I am looking for, is a query generated like this:

SELECT TOP(1) [b].[Id], [b].[Text]
FROM [BaseClasses] AS [b]
WHERE [b].[Text] = N'ABC'

So far, I have been able to achieve this in this way, which does not suit me, since I do not have constructors that directly map to every property in my real life scenario:

var baseClasses = await _context
	.Set<BaseClass>()
	.Where(b => b.Text == "ABC")
	.Select(s => new BaseClass(s.Id, s.Text))
	.ToListAsync();

Is there anything that EF Core can do to make this happen?

EF Core version: 7 Database provider: Microsoft.EntityFrameworkCore.SqlServer Target framework: .NET 7.0 Operating system: IDE: Visual Studio 2022

Wheelnius avatar Jun 23 '23 12:06 Wheelnius

@Wheelnius With TPT mapping, the type of object is determined by which tables contain a row for the entity's ID. EF won't create an instance of a type that the entity is not mapped to, so it always needs to query the additional tables to determine what type to create. Even if you filter to a single type with .Where(e => e.GetType() == typeof(BaseClass) or similar, EF will still query the other tables so that it knows not to create instances that are of a derived type. The only way to get values from the base table regardless of its type, is to project those properties, as you demonstrate.

ajcvickers avatar Jun 26 '23 13:06 ajcvickers

From C# perspective, this seems only a bit odd to me. When expecting a BaseClass, you would usually only work with the BaseClass type, except in rarer cases where you might need to know the derived type of this class (in which case it might make sense to have joining optional).

I am also wondering why RawSql queries do not work in this scenario?

Wheelnius avatar Jun 30 '23 07:06 Wheelnius

When expecting a BaseClass, you would usually only work with the BaseClass type, except in rarer cases where you might need to know the derived type of this class (in which case it might make sense to have joining optional).

@Wheelnius the point is that you cannot know whether a row corresponds to a .NET BaseClass instance only by looking at the BaseClass table in the database; if there's a corresponding row in one of the child tables, that means that both of these rows correspond to the child type, but by looking at the BaseClass table alone it's not possible to discern what the concrete type is.

Note that when mapping an inheritance hierarchy (with TPT or another strategy), .NET types do not correspond in a 1:1 way to database tables, and so you shouldn't have an expectation that only the base table is queried when your LINQ query asks only for the base .NET type.

I am also wondering why RawSql queries do not work in this scenario?

What exactly are you referring to? Can you please post a minimal, runnable code sample that shows what's not working?

roji avatar Jun 30 '23 13:06 roji

There should be a way to work with the base entity only without querying all the derived types.

var entity = context.Set<BaseClass>().FirstOrDefault(x=> x.Id == myEntityId);
entity.Text = "test";
context.SaveChanges();

This now generates a select with multiple left joins (in my company project we have more than 20) only to find the Discriminator but if the base entity is without extension table it returns NULL and creates a BaseClass entity without the need to project the entity to other class. One way this could be fixed is to use another sealed entity BaseClassOnly that is mapped to the same base table but this is no longer possible.

Could we specify the Discriminator with extension method?

context
 .Set<BaseClass>()
 .WhereDiscriminator(NULL)
 .FirstOrDefault(x=> x.Id == myEntityId);

that generates

SELECT TOP(1) [Id], [Text], NULL AS [Discriminator]
FROM [BaseClasses]
WHERE [Id] = @__p_0

instead of

SELECT TOP(1) [o].[Id], [o].[Text], CASE
    WHEN [m].[Id] IS NOT NULL THEN N'DerivedSecondClasses'
END AS [Discriminator]
FROM [dbo].[BaseClasses] AS [o]
LEFT JOIN [dbo].[DerivedSecondClasses] AS [m] ON [o].[Id] = [m].[Id]
WHERE [o].[Id] = @__p_0

petarpetrovt avatar Jul 13 '23 10:07 petarpetrovt

There should be a way to work with the base entity only without querying all the derived types.

It's worth insisting on the right terminology; you are querying only for the base entity, but doing so requires joining with all the tables. The entity is the thing in the EF model, and in the case of TPT it's mapped to multiple tables in the database.

You've chosen the TPT inheritance strategy - this really is how it functions; there's no way to know the entity type corresponding to a row in the base table without also looking at the other tables. If you were to query only the base table, you'd have no way of knowing if the row corresponds to a base entity, or just to the base part of a non-base entity (with the other part(s) being in other tables in the hierarchy).

This sort of inefficiency is one reason why TPT is generally discouraged. If you were to use either TPC or TPH mapping, this problem doesn't occur, since you only need to query a single table in order to know the entity type. TPC can be a pretty good alternative to TPT, providing very good performance.

roji avatar Jul 13 '23 14:07 roji

@roji @ajcvickers is this really not up for consideration? consider a simple TPT scenario where there is the need to obtain a list of "base entities" without actually caring about columns in the concrete tables

Entities

public class Person
{
    public int Id { get; set; }

    public string Name { get; set; }

    public DateTime DateOfBirth { get; set; }

    public int Age => DateTime.UtcNow.Year - DateOfBirth.Year;
}

public class Student : Person
{
}

public class Teacher : Person
{
}

Context

public class SchoolContext : DbContext
{
    public DbSet<Person> People { get; set; }
    public DbSet<Student> Students { get; set; }
    public DbSet<Teacher> Teachers { get; set; }

    protected override void OnModelCreating(ModelBuilder modelBuilder)
    {
        base.OnModelCreating(modelBuilder);

        modelBuilder.Entity<Person>()
            .ToTable("person")
            .HasKey(x => x.Id)
            ;

        modelBuilder.Entity<Teacher>()
            .ToTable("teacher")
            .HasBaseType<Person>()
            ;

        modelBuilder.Entity<Student>()
            .ToTable("student")
            .HasBaseType<Person>()
            ;
    }
}

API

app.MapGet("/people", async (
    [FromServices] SchoolContext context,
    [FromQuery] int? youngerThan,
    [FromQuery] int? olderThan,
    [FromQuery] int? limit,
    [FromQuery] int? offset)
    =>
{
    offset ??= 0;
    limit ??= 10;

    var query = context.People.AsQueryable();
    if (youngerThan.HasValue)
    {
        query = query.Where(x => x.DateOfBirth > DateTime.UtcNow.AddYears(-youngerThan.Value));
    }

    if (olderThan.HasValue)
    {
        query = query.Where(x => x.DateOfBirth < DateTime.UtcNow.AddYears(-olderThan.Value));
    }

    var totalCount = await query
        .CountAsync();

    var people = await query
        .Skip(offset.Value)
        .Take(limit.Value)
        .ToListAsync();

    return new
    {
        totalCount,
        count = people.Count,
        people
    };
});

Generated queries

info: Microsoft.EntityFrameworkCore.Infrastructure[10403]
      Entity Framework Core 6.0.0 initialized 'SchoolContext' using provider 'Npgsql.EntityFrameworkCore.PostgreSQL:6.0.0+025fc55189ae41e3f2b98bcbe4eb44c505653700' with options: DetailedErrorsEnabled
info: Microsoft.EntityFrameworkCore.Database.Command[20101]
      Executed DbCommand (8ms) [Parameters=[@__p_0='?' (DbType = Int32), @__p_1='?' (DbType = Int32)], CommandType='Text', CommandTimeout='30']
      SELECT COUNT(*)::INT
      FROM person AS p
      WHERE (p."DateOfBirth" > (now() + CAST((@__p_0::text || ' years') AS interval))) AND (p."DateOfBirth" < (now() + CAST((@__p_1::text || ' years') AS interval)))
info: Microsoft.EntityFrameworkCore.Database.Command[20101]
      Executed DbCommand (2ms) [Parameters=[@__p_0='?' (DbType = Int32), @__p_1='?' (DbType = Int32), @__p_3='?' (DbType = Int32), @__p_2='?' (DbType = Int32)], CommandType='Text', CommandTimeout='30']
      SELECT p."Id", p."DateOfBirth", p."Name", CASE
          WHEN t."Id" IS NOT NULL THEN 'Teacher'
          WHEN s."Id" IS NOT NULL THEN 'Student'
      END AS "Discriminator"
      FROM person AS p
      LEFT JOIN student AS s ON p."Id" = s."Id"
      LEFT JOIN teacher AS t ON p."Id" = t."Id"
      WHERE (p."DateOfBirth" > (now() + CAST((@__p_0::text || ' years') AS interval))) AND (p."DateOfBirth" < (now() + CAST((@__p_1::text || ' years') AS interval)))
      LIMIT @__p_3 OFFSET @__p_2

The count query is good, but the second query really does not need the joins if the consumer does not need to "cast" the results to either Student or Teacher - the object could be initialized as Person

The workaround is to select specific fields (using anonymous or even reusing the class works - it is just a slight pain):

var simplePeople = await query
    .Skip(offset.Value)
    .Take(limit.Value)
    .Select(x => new Person
    {
        Id = x.Id,
        Name = x.Name,
        DateOfBirth = x.DateOfBirth,
    })
    .ToListAsync();

Generates

info: Microsoft.EntityFrameworkCore.Database.Command[20101]
      Executed DbCommand (1ms) [Parameters=[@__p_0='?' (DbType = Int32), @__p_1='?' (DbType = Int32), @__p_3='?' (DbType = Int32), @__p_2='?' (DbType = Int32)], CommandType='Text', CommandTimeout='30']
      SELECT p."Id", p."Name", p."DateOfBirth"
      FROM person AS p
      WHERE (p."DateOfBirth" > (now() + CAST((@__p_0::text || ' years') AS interval))) AND (p."DateOfBirth" < (now() + CAST((@__p_1::text || ' years') AS interval)))
      LIMIT @__p_3 OFFSET @__p_2

So I guess the use case is: if the base type is not abstract, let us choose if we really need the subclasses to be initialized or if we are good with just having the base (non abstract) class as elements in the returned list.

andremantaswow avatar Apr 04 '24 20:04 andremantaswow

@andremantaswow you need to carefully read the messages just above. EF queries deal with entities - not tables - and with TPT, the same entity instance is spread across multiple tables. EF cannot know the type of an entity just by looking at the base table, it must join with subtables for that.

roji avatar Apr 05 '24 08:04 roji

@roji i understand this is as designed and was never planned, but:

So I guess the use case is: if the base type is not abstract, let us choose if we really need the subclasses to be initialized or if we are good with just having the base (non abstract) class as elements in the returned list.

To me this seems a valid use case and there is no mention of "tables" here, just "entities". In short it would be good to keep the benefits of using TPT while also allowing to query only the base (non abstract) type. eg:

// current behavior works perfectly for these use cases, there are only the necessary joins
var aSubtypes = context.Set<Base>().OfType<SubA>().ToListAsync();
var bSubtypes = context.Set<Base>().OfType<SubB>().ToListAsync();

// here there are unnecessary joins that can be avoided
// because all we need is to initialize instances of type Base and populate their properties
var bases = context.Set<Base>().OfType<Base>().ToListAsync();

andremantaswow avatar Apr 05 '24 12:04 andremantaswow

// here there are unnecessary joins that can be avoided // because all we need is to initialize instances of type Base and populate their properties var bases = context.Set<Base>().OfType<Base>().ToListAsync()

So once again; the Base table contains rows for entity types of type Base, but also for entity types of types SubA and SubB. The only way to know which rows rows in the Base table correspond to Base entity types - which is what your LINQ query above is requesting - is joining with the SubA/SubB tables and making sure there are no rows there. This is simply how TPT works, since the Base table does not contain the information to tell us the type of the entity type represented by each row.

roji avatar Apr 05 '24 14:04 roji

Maybe we need a mixed mode so that we have a discriminator column in the base entity and all extra columns/properties mapped to other tables.

petarpetrovt avatar Apr 05 '24 14:04 petarpetrovt

@petarpetrovt if you're concerned about the extra joins, you really should be looking at alternative inheritance strategies, such as TPC - any specific reason you're not doing that?

roji avatar Apr 05 '24 14:04 roji

@roji My case seems to be more niche, as I am trying to use EF on an existing database. In my case, I have a base table with a string discriminator and extension tables per discriminator, which are optional. So, I can have entities that are mapped to the base table without an additional table. As far as my understanding goes, that is not possible with TPC.

petarpetrovt avatar Apr 05 '24 15:04 petarpetrovt

@petarpetrov yeah, that's a hybrid scenario which isn't supported.

roji avatar Apr 05 '24 16:04 roji