efcore.pg icon indicating copy to clipboard operation
efcore.pg copied to clipboard

Have a better story for EFCore.PG and multihost

Open roji opened this issue 10 months ago • 9 comments

Try to have Target Session Attributes in the connection string fails the moment EFCore.PG needs to create a data source internally (full repro below):

.UseNpgsql(
	"Host=x,y;Database=test;Username=test;Password=test;Load Balance Hosts=true;Target Session Attributes=prefer-standby",
	config =>
	{
		config.ConfigureDataSource(o => {});
	})

This throws "When creating a multi-host data source, TargetSessionAttributes cannot be specified".

The Npgsql story for target sessions is to call BuildMultiHost() instead of Build(), and then to call an overload of OpenConnectionAsync() that passes TargetSessionAttributes, or to extract wrapper data sources via WithTargetSession() (docs).

EFCore.PG doesn't know whether the connection string contains multiple hosts and/or Target Session Attribute, and never calls BuildMultiHost(). It seems like the thing to do would be to add an EF-level context option for the target session, and when EF creates the NpgsqlConnection, to call OpenConnectionAsync(TargetSessionAttributes).

Note that we also have legacy mode (without NpgsqlDataSource), where we do allow Target Session Attribute in the connection string; we internally create a wrapper for the given target session and use that. We could do the same when NpgsqlDataSourceBuilder.Build() is called with a Target Session Attribute; but that would mean that the underlying NpgsqlMultiHostDataSource (which owns the actual connection pools) isn't accessible, and it's impossible to have multiple Target Session Attribute values referencing the same physical connections. In legacy mode the NpgsqlMultiHostDataSource is global, so this isn't a problem.

In the meantime, the workaround is simply to construct a multi-host data source outside EF, get a wrapper via WithTargetSession() and pass that to EFCore.PG's UseNpgsql().

@NinoFloris @vonzshik does this all make sense?

Originally raised by @fmendez89 in https://github.com/npgsql/doc/issues/263#issuecomment-2708845653

Full repro
await using var context = new BlogContext();
await context.Database.EnsureDeletedAsync();
await context.Database.EnsureCreatedAsync();

public class BlogContext : DbContext
{
    public DbSet<Blog> Blogs { get; set; }

    protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
        => optionsBuilder
            .UseNpgsql(
                "Host=localhost,localhost:6432;Database=test;Username=test;Password=test;Load Balance Hosts=true;Target Session Attributes=prefer-standby",
                config =>
                {
                    // The moment we call ConfigureDataSource(), EF attempts to build a data source internally (with NpgsqlDataSourceBuilder.Build()),
                    // causing an exception because Target Session Attributes is present in the connection string. 
                    config.ConfigureDataSource(o => {});
                })
            .LogTo(Console.WriteLine, LogLevel.Information)
            .EnableSensitiveDataLogging();
}

public class Blog
{
    public int Id { get; set; }
    public string Name { get; set; }
}

roji avatar Mar 10 '25 22:03 roji

Note that we also have legacy mode (without NpgsqlDataSource), where we do allow Target Session Attribute in the connection string; we internally create a wrapper for the given target session and use that. We could do the same when NpgsqlDataSourceBuilder.Build() is called with a Target Session Attribute; but that would mean that the underlying NpgsqlMultiHostDataSource (which owns the actual connection pools) isn't accessible, and it's impossible to have multiple Target Session Attribute values referencing the same physical connections. In legacy mode the NpgsqlMultiHostDataSource is global, so this isn't a problem.

This should already (somewhat) be working (ignoring exception for a sec) as you can call NpgsqlMultiHostDataSource.OpenConnection() (note the lacking of targetSessionAttributes in args), which should open a new connection on the original NpgsqlMultiHostDataSource (and not the wrapper), at which point we'll return a connection to a host according to settings (this should be Any).

So all in all, just removing the check should be enough (I think).

vonzshik avatar Mar 11 '25 10:03 vonzshik

Note that we also have legacy mode (without NpgsqlDataSource), where we do allow Target Session Attribute in the connection string; we internally create a wrapper for the given target session and use that.

BTW, are you sure it's true? Looking at the code, it seems like currently we'll throw the exact same error (barring Any).

Edit: it's kinda even worse, we do not throw for Any as long as there is only a single host, otherwise we do.

Edit2: OK, I missed that we indeed make a wrapper, though removing that shouldn't be a problem.

vonzshik avatar Mar 11 '25 10:03 vonzshik

Submitted https://github.com/npgsql/npgsql/pull/6046

vonzshik avatar Mar 11 '25 12:03 vonzshik

So all in all, just removing the check should be enough (I think).

I'm not sure - how would an EF user using ConfigureDataSource(), be able to use both prefer-standby (for read-only loads) and primary (for write loads), with the same underlying connection pools (i.e. same instance of NpgsqlMultiHostDataSource)? If you have two UseNpgsql() configurations in your application with ConfigureDataSource(), where the only difference between them is the value of the Target Session Attributes in the connection string, then you'd get two completely separate instances of NpgsqlMultiHostDataSource, and not share connections between them... In fact, it's maybe better to keep the exception as-is, to help users avoid falling into this pit of failure...

(this interaction of EF and Npgsql/data source is generally not simple, unfortunately)

roji avatar Mar 11 '25 13:03 roji

I'm not sure - how would an EF user using ConfigureDataSource(), be able to use both prefer-standby (for read-only loads) and primary (for write loads), with the same underlying connection pools (i.e. same instance of NpgsqlMultiHostDataSource)?

I imagine that less than 1% of developers actually need something like this, as it's mostly going to be "I want to only connect to primary". But even then, the main idea is for BuildMultiHost (and Build) to support having a default TargetSessionAttribute, which will be used by calling OpenConnection. And if you want to get a connection with other TargetSessionAttribute, we also support that. Should make the usual experience much smoother.

If you have two UseNpgsql() configurations in your application with ConfigureDataSource(), where the only difference between them is the value of the Target Session Attributes in the connection string, then you'd get two completely separate instances of NpgsqlMultiHostDataSource, and not share connections between them... In fact, it's maybe better to keep the exception as-is, to help users avoid falling into this pit of failure...

Mm, you mean in case of legacy, yeah? That indeed makes sense, kinda forgot about that.

vonzshik avatar Mar 11 '25 13:03 vonzshik

I imagine that less than 1% of developers actually need something like this, as it's mostly going to be "I want to only connect to primary".

Interesting, I'd think the majority just want Any, not Primary, which IIRC is also the default (in both libpq and Npgsql). That's already well-supported via EF too - just don't specify Target Session Attributes and everything works.

Though I guess you're right, and there's no really good reason to block Target Session Attributes with Build(), and for applications where just one attribute is needed (e.g. Primary) it smooths things out. I'm still a bit worried that EF users will unknowingly end up having two connection pools instead of one though...

Mm, you mean in case of legacy, yeah? That indeed makes sense, kinda forgot about that.

In legacy (i.e. no data source) we don't throw (create the wrapper internally) - maybe we're talking about different things?

roji avatar Mar 11 '25 13:03 roji

Interesting, I'd think the majority just want Any, not Primary, which IIRC is also the default (in both libpq and Npgsql).

The problem with this is, most of the apps do write stuff into the database, and since vanilla postgres doesn't support multi-master... getting a standby will most likely just completely break the app. And speaking from experience, the app I'm working on doesn't really care whether there are multiple hosts or not, as long as we connect to primary. So allowing us just calling Build will make things much-much easier.

In legacy (i.e. no data source) we don't throw (create the wrapper internally) - maybe we're talking about different things?

We might mix things a bit. I was mostly talking about https://github.com/npgsql/npgsql/pull/6046 where I did remove that wrapper (it's already back + I added a test just in case).

Though I guess you're right, and there's no really good reason to block Target Session Attributes with Build(), and for applications where just one attribute is needed (e.g. Primary) it smooths things out. I'm still a bit worried that EF users will unknowingly end up having two connection pools instead of one though...

I'm not sure whether you're talking about legacy here or not. In case of NpgsqlDataSourceBuilder, each builder will create separate NpgsqlMultiHostDataSource instances, which do not affect each other. In case of legacy (connection string), right now they will create a singular instance of NpgsqlMultiHostDataSource and share it (even if through a wrapper). Now that you reminded me of that behavior, I brought it back in that pr, so pretty much the only change there (aside of tests) is that you can specify TargetSessionAttribute on connection string and then create an NpgsqlDataSource out of it, which will return connections according to TargetSessionAttribute as long as you call OpenConnection().

vonzshik avatar Mar 11 '25 14:03 vonzshik

+1 for this

I have an ASP.NET Core app that uses EF to connect to a Postgres database (which has a hot standby via streaming replication). The bulk of the work done by the API involves only read operations, but there are a few endpoints here and there doing writes. It would be great if there was some mechanism in EF Core/DI to easily specify that a Controller that only does read operations wants a connection with TargetSessionAttributes.PreferStandby while another Controller that does write operations needs TargetSessionAttributes.ReadWrite

judilsteve avatar Oct 01 '25 05:10 judilsteve

@judilsteve this is already possible - just not the nicest (see the OP).

  • You can have two NpgsqlDataSources registered in DI - use BuildMultiHost() and then extract read-write and read-only NpgsqlDataSources by calling WithTargetSession() on it, and register these as singletons in DI (with different service keys).
  • Then, similarly register two DbContexts with two service keys, one for read-only and one for read-write, each initialized with the right NpgsqlDataSource from above. Finally, get injected with whatever's appropriate for each controller.

The full config for this is something we should look at and show, but it shouldn't be too hard. Of course, it would be nicer if the provider provided some higher-level mechanism for making all this easier, but you shouldn't be blocked on that.

roji avatar Oct 02 '25 09:10 roji