denque icon indicating copy to clipboard operation
denque copied to clipboard

Performance improvement by specifying initial array size

Open nemosmithasf opened this issue 3 years ago • 8 comments

Please correct me if I'm wrong, but wouldn't it be more performant if we can specify an initial array capacity rather than rely on the default size of 4? Especially if we are using it as a circular buffer...

I assume this would save us a few "grow array" operations?

nemosmithasf avatar Jun 26 '22 04:06 nemosmithasf

I did some benchmarking and it seems changing the default array size significant deteriorates the performance.

That's really interesting. I couldn't figure out why from looking at the source. For my own edification, could you explain why this is? Or perhaps point to some part of the code that I can look at in closer detail to figure it out?

nemosmithasf avatar Jun 29 '22 01:06 nemosmithasf

Can you share how / what you benchmarked to come to that conclusion? I am also currently looking at possibly improving the performance of this library in certain situations. Especially the actual growth operation is currently pretty inefficient and I have seen quite significant improvements in situations where it needs to grow a lot

dnlmlr avatar Jul 12 '22 14:07 dnlmlr

I just copied the code into my codebase and changed 4 to a value that is closer to what I anticipated my average queue size to be.

I then ran a benchmark on my codebase and compared having 4 and the new value as the default; and found that it was slower.

That said, my benchmark wasn't especially scientific, so don't treat it as authoritative- I'd definitely do independent testing if I were you.

nemosmithasf avatar Jul 14 '22 00:07 nemosmithasf

Only changing the new Array(4) will not work since the wrap-around is not calculated by doing index % this._list.length, but by index & this._capacityMask. This means that the capacity must always be a multiple of 2 for the wrapping to work and the _capacityMask must be also set to the matching value. The _capacityMask is always capacity - 1, which creates a binary mask that has 1 at all bit positions up to the capacity (for example cap = 16 = 0b10000 and mask = 15 = 0b01111).

For testing you could try to set the capacity to the next larger power of 2, e.g. new Array(1024) and the mask to 1 less this._capacityMask = 1023

EDIT: In this text when I say "capacity" I refer to the Array size (this._list.length / new Array(N)) and not the this._capacity variable. this._capacity is something different and not directly related to this conversation.

dnlmlr avatar Jul 14 '22 13:07 dnlmlr

I did a small test implementation of this with a matching benchmark and it seems to work. Tested at 1_000_000, 1_000 and 100 element insertions it is consistently faster when you know the capacity before.

  • 1_000_000 elements: 58% faster
  • 1_000 elements: 56% faster
  • 100 elements: 154% faster
Platform info:
Windows_NT 10.0.19044 x64
Node.JS 18.5.0
V8 10.2.154.4-node.8
AMD Ryzen 7 2700X Eight-Core Processor          × 16

Denque (1000000) default capacity x 87.24 ops/sec ±1.67% (73 runs sampled)
Denque (1000000) preallocated capacity x 138 ops/sec ±0.58% (77 runs sampled)
Denque (1000) default capacity x 144,139 ops/sec ±2.03% (87 runs sampled)
Denque (1000) preallocated capacity x 225,468 ops/sec ±0.48% (92 runs sampled)
Denque (100) default capacity x 700,648 ops/sec ±0.36% (95 runs sampled)
Denque (100) preallocated capacity x 1,785,077 ops/sec ±0.61% (95 runs sampled)
DoubleEndedQueue (1000000) default capacity x 32.08 ops/sec ±2.08% (56 runs sampled)
DoubleEndedQueue (1000000) preallocated capacity x 30.67 ops/sec ±2.00% (55 runs sampled)

Here is the code. To run the benchmark do node benchmark/initialCapacity.js

One question about the possible implementation is how the initial capacity should be specified. In the example I made the first constructor arg be either array or number.

dnlmlr avatar Jul 16 '22 21:07 dnlmlr

Only changing the new Array(4) will not work since the wrap-around is not calculated by doing index % this._list.length, but by index & this._capacityMask. This means that the capacity must always be a multiple of 2 for the wrapping to work and the _capacityMask must be also set to the matching value. The _capacityMask is always capacity - 1, which creates a binary mask that has 1 at all bit positions up to the capacity (for example cap = 16 = 0b10000 and mask = 15 = 0b01111).

Thanks for this explanation, though I'd be lying if I said I understood it all. Still, it seems like a fascinating technique, definitely something I'll look into.

One question about the possible implementation is how the initial capacity should be specified. In the example I made the first constructor arg be either array or number.

Wouldn't it made sense to just add a property to the options object at arg_1?

nemosmithasf avatar Jul 18 '22 19:07 nemosmithasf

Wouldn't it made sense to just add a property to the options object at arg_1?

Yeah that would probably be the other option. The problem I could see with that is that you need to specify both args in order to use the options object. So that means you would have to give an empty array as arg1 which makes the interface kinda klunky. Also the options argument is currently not even documented, not sure what is up with that

dnlmlr avatar Jul 19 '22 20:07 dnlmlr

you need to specify both args in order to use the options object.

You can solve it by overloading, since arg_0 can never be an object.

nemosmithasf avatar Jul 20 '22 00:07 nemosmithasf