Dapper
Dapper copied to clipboard
Slowness when using large number of dynamic parameters. Fixes #1537
Calling command.Parameters.Contains(name)
for every parameter is an n-squared algorithm for many database implementations and can cause performance degredation when there is a large number of parameters. This fixes that performance issue.
I like this, but can I ask you to pull in (or at least get your thoughts on) https://github.com/DapperLib/Dapper/commit/ddc673711355634eed237c4a7b49a37205f6cf21 ? this avoids paying the overhead for small scenarios, which is the majority of cases
(I'm trying not to mess up the commit history by stomping it)
I like this, but can I ask you to pull in (or at least get your thoughts on) ddc6737 ? this avoids paying the overhead for small scenarios, which is the majority of cases
(I'm trying not to mess up the commit history by stomping it)
I think your changes are good. You might even set the threshold to 20 instead of 10, but some benchmarking would be necessary to pinpoint the sweet spot where the Dictionary overhead outbalances the performance degradation looking up parameters.
@jehhynes do you want to cherry-pick it into your branch so that I can do everything in one squashed merge?
@mgravell Knowledge is power. I did up a benchmark so we can see what we're dealing with. I think a threshold of 8 or 10 would be ok, but not really necessary. The straightforward implementation of always using the HashSet/Dictionary is good in the single digits as well as higher.
If we really wanted to be anal about it we could compare my HashSet implementation to your Dictionary implementation. From my research, HashSet may actually be a tad faster since it's only storing the keys and not a reference to the parameter.
without the dictionary, in the pathological case, we're still at the mercy of whatever the line p = (IDbDataParameter)command.Parameters[name];
chooses to do; if the parameters are indexed, such that this is not O(N), I would expect Contains
to use that index; hence my thinking that if we're going to the effort of creating a hash, we might as well go full hog and create a dictionary.
But I will acknowledge: this only impacts the bad case where parameters are duplicated and are being overwritten. In reality: that will almost never be hit. If you have the code already, I'd be interested in the timings for hashset instead of dictionary, although I don't expect it to be very different.
I guess I'm also philosophically opposed to adding the (albeit minor) allocation overhead in the "it doesn't matter" scenario ;)
and yes, I know "lets avoid a few allocations" is almost comical when dealing with ADO.NET, which is famously allocation-heavy
@mgravell Thanks for the explanation of your approach, I do see the benefits. I guess we had to do the following comparison at some point :-)
If you're interested, here's what I used to generate the comparison:
I think one limitation with your approach is you're always evaluating the first 10 checks using the expensive n-squared algo, even if there are more than 10 parameters that will eventually be added and thus you'll eventually have to create a dictionary anyway. I think if you checked parameters.Count
instead of checking command.Parameters.Count
you could overcome this issue and bring your implementation's performance in line with mine.
As you appear to have specific philosophical ideas about how this ought to be implemented, I'm going to pass this back to you to put in the finishing touches and merge it in. Thanks for your attention with this issue!