EFCore.BulkExtensions icon indicating copy to clipboard operation
EFCore.BulkExtensions copied to clipboard

Postgresql: syntax error at or near "RETURNING" in simple `BulkInsertOrUpdate`

Open sandreas opened this issue 3 years ago • 5 comments
trafficstars

Hello,

I tried out BulkInsertOrUpdate extensions to improve performance of one of my projects. Unfortunately I failed with my first small example and now I wanted to ask, if I'm doing something wrong?!

My Versions

<PackageReference Include="EFCore.BulkExtensions" Version="6.5.5" />

PostgreSQL 13.1 (Debian 13.1-1.pgdg100+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit

I have an entity Tag (see below). It has an ID for faster WHERE filters, but for an insert, I want it to be unique by the Value column. Now I want to bulk insert packages of Tags, but only, if the value does not exist - all other columns can be ignored for the insert AND it is INSERT ONLY (NO UPDATES REQUIRED).

CreatedDate and UpdatedDate are automatically set via AppDbContext.SaveChanges override... is that a problem?

public class AppDbContext : DbContext
{
    public DbSet<Tag> Tags { get; set; }
    public AppDbContext(DbContextOptions<AppDbContext> options) : base(options)
    {

    }

    public override int SaveChanges()
    {
        var now = DateTimeOffset.UtcNow;

        foreach (var changedEntity in ChangeTracker.Entries())
        {
            if (changedEntity.Entity is ModelBaseDated entity)
            {
                switch (changedEntity.State)
                {
                    case EntityState.Added:
                        entity.CreatedDate = now;
                        entity.UpdatedDate = now;
                        break;
                    case EntityState.Modified:
                        entity.UpdatedDate = now;
                        break;
                }
            }
        }

        return base.SaveChanges();

    }
}

[Index(nameof(Value), IsUnique = true)]
public class Tag: Identifiable<Guid>
{
    [Key]
    [DatabaseGenerated(DatabaseGeneratedOption.Identity)]
    public override Guid Id { get; set; }

    [Attr] public string Value { get; set; } = "";
    [Attr] public DateTimeOffset CreatedDate { get; set; }
    [Attr] public DateTimeOffset UpdatedDate { get; set; }
}    

var tagEntities = new List<Tag>        {
    new()  {Value = "Value1"},
    new()  {Value = "Value2"},
    new()  {Value = "Value3"},
    new()  {Value = "Value4"},
    new()  {Value = "Value4"},
    new()  {Value = "Value3"},
};

var bulkConfig = new BulkConfig
{
    UpdateByProperties = new List<string> { nameof(Tag.Value)}, // handle "Value" as primary key, ignore other fields
    PropertiesToIncludeOnUpdate = new List<string> {""}, // force INSERT ONLY like documented in readme.md
    SetOutputIdentity = true // get Id from DB
};
db.BulkInsertOrUpdate(tagEntities, bulkConfig);

Error:

42601: syntax error at or near "RETURNING"

POSITION: 180

FinalCommandText:
INSERT INTO "Tags" ("Id", "CreatedDate", "UpdatedDate", "Value") (SELECT "Id", "CreatedDate", "UpdatedDate", "Value" FROM "TagsTemp8d6a14b7") ON CONFLICT ("Value") DO UPDATE SET  RETURNING "Id", "CreatedDate", "UpdatedDate", "Value"

It seems that there is something missing between DO UPDATE SET and RETURNING (two spaces)... Maybe this should mean:

INSERT INTO "Tags" ("Id", "CreatedDate", "UpdatedDate", "Value") (SELECT "Id", "CreatedDate", "UpdatedDate", "Value" FROM "TagsTemp8d6a14b7") ON CONFLICT ("Value") DO NOTHING RETURNING "Id", "CreatedDate", "UpdatedDate", "Value"

If I modify the BulkInsertOrUpdate part like this:

        var bulkConfig = new BulkConfig
        {
            UpdateByProperties = new List<string> { nameof(Tag.Value)},
            // PropertiesToIncludeOnUpdate = new List<string> {""},
            PropertiesToIncludeOnUpdate = new List<string> {nameof(Tag.Value)},
            SetOutputIdentity = true
        };
        db.BulkInsertOrUpdate(tagEntities, bulkConfig);

the Query is "valid", but unfortunately I get another error:

          Severity: ERROR
          SqlState: 21000
          MessageText: ON CONFLICT DO UPDATE command cannot affect row a second time
          Hint: Ensure that no rows proposed for insertion within the same command have duplicate constrained values.

This would mean, that in this file: https://github.com/borisdj/EFCore.BulkExtensions/blob/a0ae3f94051c6d28bf0d662718793c892f6f5271/EFCore.BulkExtensions/SQLAdapters/PostgreSql/SqlQueryBuilderPostgreSql.cs#L97

it has to be checked, if equalsColumns is empty and a LIMIT 1 is missing:

            q = $"INSERT INTO {tableInfo.FullTableName} ({commaSeparatedColumns}) " +
                $"(SELECT {commaSeparatedColumns} FROM {tableInfo.FullTempTableName}) LIMIT 1" + // LIMIT 1 added
                $"ON CONFLICT ({updateByColumns}) ";

            if(columnsListEquals.Count == 0) {
                q += $"DO NOTHING";
            } else {
                q += $"DO UPDATE SET {equalsColumns}";
            }

Is that correct? And if so, could you please help me out?

sandreas avatar Jul 26 '22 13:07 sandreas

Can confirm the bug, if you manage to do a fix soon you can make a PR.

borisdj avatar Jul 26 '22 23:07 borisdj

Can confirm the bug, if you manage to do a fix soon you can make a PR.

I'll try my best.

sandreas avatar Jul 27 '22 02:07 sandreas

Here is my PR. Please ensure that LIMIT 1 has no unwanted side effects.

https://github.com/borisdj/EFCore.BulkExtensions/pull/910

sandreas avatar Jul 28 '22 07:07 sandreas

I have merged PR and run Test InsertTestPostgreSql but it fails on line Assert.Equal("UPDATE 3", context.Items.Where(a => a.Name == "Name 3").AsNoTracking().FirstOrDefault()?.Description); You should have run this one and checked it after changes.

It seems that issue is with LIMIT 1 because of which line context.BulkInsertOrUpdate(entities2, new BulkConfig() { NotifyAfter = 1 }, (a) => WriteProgress(a)); Does not insert insert record Name 3 (entities2 is List with [Name2, Name3])

Maybe the solution would be to add config property DoApplyPGLimit1 { get; set; } (false by default)

borisdj avatar Aug 02 '22 14:08 borisdj

Thanks for taking a look at this.

Maybe the solution would be to add config property DoApplyPGLimit1 { get; set; } (false buy default)

Well, If we HAVE to add a config option, I would suggest a more generic one like ApplySubqueryLimit with 0as default being not applied in pgsql and !=0 being applied as its value, BUT something is fishy with this. I would do some further investigation before adding another possibly unnecessary config option, maybe we can prevent that.

sandreas avatar Aug 02 '22 15:08 sandreas

Have you found a way to resolve this with some internal condition, without explicit config?

borisdj avatar Aug 08 '22 19:08 borisdj

Not yet. I'm currently busy with family stuff. But I think I can look at it in roughly a week. If that is too late, you might as well go for the generic config option.

sandreas avatar Aug 08 '22 19:08 sandreas

Would this work for your case, to add LIMIT only when DO NOTHING

q = $"INSERT INTO {tableInfo.FullTableName} ({commaSeparatedColumns}) " +
    $"(SELECT {commaSeparatedColumns} FROM {tableInfo.FullTempTableName}) " +
    $"ON CONFLICT ({updateByColumns}) ";

if(columnsToUpdate.Count == 0 || string.IsNullOrWhiteSpace(equalsColumns)) {
    q = q.Replace("ON CONFLICT", "LIMIT 1 ON CONFLICT");
    q += "DO NOTHING";
} else{
    q += $"DO UPDATE SET {equalsColumns}";
}

borisdj avatar Aug 08 '22 19:08 borisdj

Phew, probably... but I'm not a big fan of this kind of replacements, because if there is a table or column named ON CONFLICT (although pretty unlikely), it would also replace these...

But hey, since I don't have a better solution, maybe the "whatever works" approach is enough until there is something better.

sandreas avatar Aug 08 '22 19:08 sandreas

I'll fix it this way so that can publish new nuget. Will put with Bracket )...(, less chance of conflict with ON CONFLICT :D q = q.Replace(") ON CONFLICT (", ") LIMIT 1 ON CONFLICT (");

If you manage to find a better fix make a PR, otherwise we can leave it so.

borisdj avatar Aug 08 '22 19:08 borisdj

Thank you. This will be enough for my purpose and I'm grateful for your support. Good job.

sandreas avatar Aug 08 '22 19:08 sandreas

Actually I've refactored it the following way (cleaner):

bool applySubqueryLimit = columnsToUpdate.Count == 0 || string.IsNullOrWhiteSpace(equalsColumns);
var subqueryText = applySubqueryLimit ? "LIMIT 1 " : "";

q = $"INSERT INTO {tableInfo.FullTableName} ({commaSeparatedColumns}) " +
    $"(SELECT {commaSeparatedColumns} FROM {tableInfo.FullTempTableName}) " + subqueryText + 
    $"ON CONFLICT ({updateByColumns}) " +
    (applySubqueryLimit
     ? "DO NOTHING"
     : $"DO UPDATE SET {equalsColumns}");

borisdj avatar Aug 08 '22 19:08 borisdj

allright :-)

sandreas avatar Aug 08 '22 19:08 sandreas

@sandreas with 7.1.5, code is changed so now there is BulkConfig ApplySubqueryLimit for this.

borisdj avatar Jul 25 '23 14:07 borisdj