Bulk Insert/Update sql genereated on staging table causes clustered index scan
Description
Having a problem with the query generated by BulkMergeAsync, it ads null checks to query avoding our index, this results in a CLUSTERED INDEX SCAN instead of prefered INDEX SEEK
await ((BobContext)UnitOfWork.Context).BulkMergeAsync(bobsList, options => { options.ColumnInputExpression = c => new { c.Guid, c.Weight, # alot more properties removed for example. }; options.IgnoreOnMergeUpdateExpression = entity => new { entity.Guid }; options.ColumnPrimaryKeyExpression = key => new { key.LicensePlate, key.Colony}; options.AllowDuplicateKeys = true; options.OnMergeUpdateUseCoalesce = true; });
This generates sql: ` MERGE INTO [Data].[Bobs] AS DestinationTable USING ( SELECT TOP 100 PERCENT *, ROW_NUMBER() OVER ( PARTITION BY [LicensePlate],[Colony] ORDER BY ZZZ_Index DESC ) AS ZZZ_Row_Number FROM (SELECT TOP 100 PERCENT * FROM #ZZZProjects_71fd33fb_4951_4774_abfb_3224f4906d5b WHERE ZZZ_Index >= @IndexStart AND ZZZ_Index <= @IndexEnd ORDER BY ZZZ_Index) AS StagingTable ORDER BY ZZZ_Index
) AS StagingTable ON (DestinationTable.[LicensePlate] = StagingTable.[LicensePlate] OR (DestinationTable.[LicensePlate] IS NULL AND StagingTable.[LicensePlate] IS NULL)) AND (DestinationTable.[Colony] = StagingTable.[Colony] OR (DestinationTable.[Colony] IS NULL AND StagingTable.[Colony] IS NULL)) WHEN MATCHED AND ZZZ_Row_Number = 1 THEN UPDATE SET =========>>> NOICE REMOVED <<<=========== WHEN NOT MATCHED AND (ZZZ_Row_Number = 1 OR ( StagingTable.[LicensePlate] IS NULL AND StagingTable.[Colony] IS NULL )) THEN INSERT ( =========>>> NOICE REMOVED <<<=========== ) ` The important part
ON (DestinationTable.[LicensePlate] = StagingTable.[LicensePlate] OR (DestinationTable.[LicensePlate] IS NULL AND StagingTable.[LicensePlate] IS NULL)) AND (DestinationTable.[Colony] = StagingTable.[Colony] OR (DestinationTable.[Colony] IS NULL AND StagingTable.[Colony] IS NULL))
This ends up in a CLUSTERED INDEX SCAN because there are 8 million bobs in our table this takes about 5-10 seconds (end up with a PAGE LOCK or TABLE LOCK.)
If we take the raw query and remove the IS NULL checks we get a INDEX SEEK on our uniqe index for LicencePlate+Colony and the query takes only some ms.
Is there any way to avoid the null checks?
I know we are using a bit outdated version, but this nullchecks are also added by the 5.2.x versions of this.
Further technical details
- EF version: 5.0.7
- EF Extensions version: 5.1.41
- Database Provider: SQL Server
Hello @danpette ,
There is currently no way to remove it and we cannot do it by default due to many reasons including backward compatibility.
However, we can add an option that will let you choose if the NULL check should be done or not. I will speak with my developer to add this option.
Best Regards,
Jon
Hello @danpette ,
A new version has been released.
As said, we cannot do it by default but in this version, we added the option DisablePrimaryKeyNullCheck = true
It is possible to set it by default on your side:
EntityFrameworkManager.BulkOperationBuilder = builder =>
{
builder.DisablePrimaryKeyNullCheck = true;
};
Or by specific bulk operations.
Let me know if that option solves your performance issue.
On our side, we will try to improve this logic even more by perhaps doing only the null check by default if we are 100% sure that the column is nullable in the database.
Best Regards,
Jon
Great, thanks. We will check it out and get back to you :)
Your later options sound even better (as a future fix). Our workaround for this problem was to make the property nullable and recreate the index.
I will keep you updated on the later options as this is definitively something that we will investigate
Hello @danpette ,
The v5.2.12 has been released.
The DisablePrimaryKeyNullCheck should no longer be mandatory for scenarios that we can retrieve if the column is nullable or not. There is still 1 scenario we need to look but all others are supported.
Is it possible to test it and let us know if everything is working?
Best Regards,
Jon