dataSquare.extendSquare can take advantage of loop reuse when building fillerExtendedRow and fillerRow
While studying this code I noticed something that fillerRow is as long as ds.width+extendedWidth while fillerExtendedRow is as long as extendedWidth we could take advantage of loop reuse to build both fillerExtendedRow and fillerRow we can simply build them in the same loop and add a guard against the shorter length of extendedWidth
diff --git a/datasquare.go b/datasquare.go
index 0c16f59..5af113b 100644
--- a/datasquare.go
+++ b/datasquare.go
@@ -80,12 +80,12 @@ func (ds *dataSquare) extendSquare(extendedWidth uint, fillerChunk []byte) error
newSquareRow := make([][][]byte, newWidth)
fillerExtendedRow := make([][]byte, extendedWidth)
- for i := uint(0); i < extendedWidth; i++ {
- fillerExtendedRow[i] = fillerChunk
- }
-
fillerRow := make([][]byte, newWidth)
for i := uint(0); i < newWidth; i++ {
+ if i < extendedWidth {
+ fillerExtendedRow[i] = fillerChunk
+ }
fillerRow[i] = fillerChunk
}
which produces interesting results
name old time/op new time/op delta
Encoding/Leopard_128_shares_512-8 39.8µs ± 5% 39.0µs ± 3% ~ (p=0.165 n=10+10)
Decoding/Leopard_128_shares_512-8 422ns ± 3% 424ns ± 4% ~ (p=0.529 n=10+10)
EDSRoots/32x32x512_ODS-8 5.92ms ± 4% 5.92ms ± 4% ~ (p=0.853 n=10+10)
EDSRoots/64x64x512_ODS-8 22.3ms ± 3% 22.2ms ± 2% ~ (p=0.661 n=10+9)
EDSRoots/128x128x512_ODS-8 106ms ±29% 87ms ± 4% -17.51% (p=0.022 n=9+10)
EDSRoots/256x256x512_ODS-8 380ms ± 1% 350ms ± 3% -7.90% (p=0.000 n=8+9)
EDSRoots/512x512x512_ODS-8 1.55s ±11% 1.35s ± 2% -12.69% (p=0.000 n=9+9)
Repair/Leopard_4x4x512_ODS-8 370µs ± 6% 344µs ± 2% -7.09% (p=0.000 n=10+9)
Repair/Leopard_8x8x512_ODS-8 1.37ms ± 3% 1.39ms ± 7% ~ (p=0.315 n=9+10)
Repair/Leopard_16x16x512_ODS-8 5.56ms ± 2% 5.43ms ± 1% -2.23% (p=0.000 n=10+10)
Repair/Leopard_32x32x512_ODS-8 23.4ms ± 4% 22.3ms ± 4% -4.90% (p=0.001 n=9+9)
Repair/Leopard_64x64x512_ODS-8 93.8ms ± 2% 89.5ms ± 2% -4.60% (p=0.000 n=9+10)
Repair/Leopard_128x128x512_ODS-8 387ms ±16% 372ms ± 7% ~ (p=0.912 n=10+10)
Repair/Leopard_256x256x512_ODS-8 2.52s ± 5% 2.47s ± 2% ~ (p=0.481 n=10+10)
Repair/Leopard_512x512x512_ODS-8 8.93s ± 5% 8.66s ± 4% -3.03% (p=0.029 n=10+10)
ExtensionEncoding/Leopard_4x4x512_ODS-8 24.0µs ±10% 24.4µs ± 7% ~ (p=0.190 n=10+10)
ExtensionEncoding/Leopard_8x8x512_ODS-8 61.9µs ±14% 59.2µs ± 4% ~ (p=0.075 n=10+10)
ExtensionEncoding/Leopard_16x16x512_ODS-8 154µs ± 8% 148µs ± 3% -4.23% (p=0.010 n=10+9)
ExtensionEncoding/Leopard_32x32x512_ODS-8 474µs ± 5% 475µs ± 3% ~ (p=0.400 n=9+10)
ExtensionEncoding/Leopard_64x64x512_ODS-8 1.74ms ± 9% 1.64ms ± 6% -5.63% (p=0.008 n=9+9)
ExtensionEncoding/Leopard_128x128x512_ODS-8 7.96ms ± 5% 7.04ms ± 1% -11.56% (p=0.000 n=7+10)
ExtensionEncoding/Leopard_256x256x512_ODS-8 40.6ms ±13% 36.9ms ±11% -9.21% (p=0.043 n=10+9)
ExtensionEncoding/Leopard_512x512x512_ODS-8 164ms ± 4% 164ms ± 5% ~ (p=1.000 n=8+8)
ExtensionWithRoots/Leopard_4x4x512_ODS-8 171µs ± 6% 170µs ± 2% ~ (p=0.604 n=10+9)
ExtensionWithRoots/Leopard_8x8x512_ODS-8 531µs ±10% 532µs ± 8% ~ (p=0.739 n=10+10)
ExtensionWithRoots/Leopard_16x16x512_ODS-8 1.72ms ± 3% 1.79ms ±11% ~ (p=0.133 n=10+9)
ExtensionWithRoots/Leopard_32x32x512_ODS-8 6.10ms ± 4% 7.32ms ±23% +20.00% (p=0.000 n=10+9)
ExtensionWithRoots/Leopard_64x64x512_ODS-8 23.6ms ± 2% 24.0ms ± 3% ~ (p=0.136 n=9+9)
ExtensionWithRoots/Leopard_128x128x512_ODS-8 97.2ms ± 9% 96.4ms ± 8% ~ (p=0.529 n=10+10)
ExtensionWithRoots/Leopard_256x256x512_ODS-8 459ms ±11% 399ms ± 5% -13.06% (p=0.000 n=10+9)
ExtensionWithRoots/Leopard_512x512x512_ODS-8 1.79s ± 4% 1.58s ± 4% -12.04% (p=0.000 n=9+8)
name old alloc/op new alloc/op delta
Encoding/Leopard_128_shares_512-8 120kB ± 2% 118kB ± 2% -1.58% (p=0.043 n=10+10)
Decoding/Leopard_128_shares_512-8 0.00B 0.00B ~ (all equal)
EDSRoots/32x32x512_ODS-8 1.81MB ± 0% 1.81MB ± 0% ~ (p=0.753 n=10+10)
EDSRoots/64x64x512_ODS-8 6.24MB ± 0% 6.24MB ± 0% ~ (p=1.000 n=9+10)
EDSRoots/128x128x512_ODS-8 26.5MB ± 0% 26.5MB ± 0% ~ (p=0.833 n=10+10)
EDSRoots/256x256x512_ODS-8 109MB ± 0% 109MB ± 0% ~ (p=0.435 n=9+9)
EDSRoots/512x512x512_ODS-8 497MB ± 0% 497MB ± 0% ~ (p=0.467 n=8+8)
Repair/Leopard_4x4x512_ODS-8 163kB ± 1% 164kB ± 1% ~ (p=0.190 n=10+10)
Repair/Leopard_8x8x512_ODS-8 521kB ± 4% 526kB ± 4% ~ (p=0.353 n=10+10)
Repair/Leopard_16x16x512_ODS-8 1.89MB ± 9% 1.89MB ±10% ~ (p=0.912 n=10+10)
Repair/Leopard_32x32x512_ODS-8 7.68MB ± 5% 7.79MB ± 8% ~ (p=0.280 n=10+10)
Repair/Leopard_64x64x512_ODS-8 32.0MB ± 5% 33.5MB ± 7% +4.51% (p=0.009 n=10+10)
Repair/Leopard_128x128x512_ODS-8 131MB ± 7% 133MB ±11% ~ (p=0.280 n=10+10)
Repair/Leopard_256x256x512_ODS-8 440MB ± 1% 440MB ± 0% ~ (p=0.912 n=10+10)
Repair/Leopard_512x512x512_ODS-8 1.72GB ± 0% 1.72GB ± 0% ~ (p=0.965 n=8+10)
ExtensionEncoding/Leopard_4x4x512_ODS-8 38.3kB ± 0% 38.3kB ± 0% ~ (p=0.529 n=10+10)
ExtensionEncoding/Leopard_8x8x512_ODS-8 147kB ± 0% 147kB ± 1% ~ (p=0.113 n=10+9)
ExtensionEncoding/Leopard_16x16x512_ODS-8 614kB ± 1% 612kB ± 0% ~ (p=0.436 n=10+10)
ExtensionEncoding/Leopard_32x32x512_ODS-8 2.51MB ± 1% 2.52MB ± 1% ~ (p=0.247 n=10+10)
ExtensionEncoding/Leopard_64x64x512_ODS-8 10.3MB ± 2% 10.2MB ± 1% ~ (p=0.447 n=10+9)
ExtensionEncoding/Leopard_128x128x512_ODS-8 42.6MB ± 3% 42.9MB ± 3% ~ (p=0.258 n=9+9)
ExtensionEncoding/Leopard_256x256x512_ODS-8 127MB ± 0% 127MB ± 0% ~ (p=0.529 n=10+10)
ExtensionEncoding/Leopard_512x512x512_ODS-8 512MB ± 0% 512MB ± 0% ~ (p=0.971 n=10+10)
ExtensionWithRoots/Leopard_4x4x512_ODS-8 119kB ± 0% 119kB ± 0% ~ (p=0.404 n=10+10)
ExtensionWithRoots/Leopard_8x8x512_ODS-8 356kB ± 1% 357kB ± 1% ~ (p=0.143 n=10+10)
ExtensionWithRoots/Leopard_16x16x512_ODS-8 1.20MB ± 2% 1.20MB ± 1% ~ (p=0.475 n=10+7)
ExtensionWithRoots/Leopard_32x32x512_ODS-8 4.36MB ± 2% 4.34MB ± 2% ~ (p=0.673 n=8+9)
ExtensionWithRoots/Leopard_64x64x512_ODS-8 16.5MB ± 5% 16.6MB ± 3% ~ (p=0.604 n=10+9)
ExtensionWithRoots/Leopard_128x128x512_ODS-8 73.4MB ±19% 77.8MB ±16% ~ (p=0.089 n=10+10)
ExtensionWithRoots/Leopard_256x256x512_ODS-8 236MB ± 0% 237MB ± 0% ~ (p=0.280 n=10+10)
ExtensionWithRoots/Leopard_512x512x512_ODS-8 1.01GB ± 0% 1.01GB ± 0% ~ (p=0.159 n=9+8)
name old allocs/op new allocs/op delta
Encoding/Leopard_128_shares_512-8 132 ± 0% 132 ± 0% ~ (all equal)
Decoding/Leopard_128_shares_512-8 0.00 0.00 ~ (all equal)
EDSRoots/32x32x512_ODS-8 34.1k ± 0% 34.1k ± 0% ~ (p=1.000 n=10+10)
EDSRoots/64x64x512_ODS-8 134k ± 0% 134k ± 0% ~ (p=0.550 n=10+10)
EDSRoots/128x128x512_ODS-8 530k ± 0% 530k ± 0% ~ (p=0.248 n=10+8)
EDSRoots/256x256x512_ODS-8 2.11M ± 0% 2.11M ± 0% ~ (p=0.480 n=10+10)
EDSRoots/512x512x512_ODS-8 8.42M ± 0% 8.42M ± 0% ~ (p=0.467 n=8+8)
Repair/Leopard_4x4x512_ODS-8 772 ± 0% 772 ± 0% ~ (all equal)
Repair/Leopard_8x8x512_ODS-8 2.86k ± 0% 2.86k ± 0% ~ (p=0.802 n=9+9)
Repair/Leopard_16x16x512_ODS-8 10.9k ± 0% 10.9k ± 0% ~ (p=0.698 n=10+10)
Repair/Leopard_32x32x512_ODS-8 42.2k ± 0% 42.2k ± 0% ~ (p=0.433 n=10+9)
Repair/Leopard_64x64x512_ODS-8 166k ± 0% 166k ± 0% ~ (p=0.985 n=10+10)
Repair/Leopard_128x128x512_ODS-8 660k ± 0% 660k ± 0% ~ (p=0.725 n=10+10)
Repair/Leopard_256x256x512_ODS-8 2.64M ± 0% 2.63M ± 0% ~ (p=0.853 n=10+10)
Repair/Leopard_512x512x512_ODS-8 10.5M ± 0% 10.5M ± 0% ~ (p=0.965 n=8+10)
ExtensionEncoding/Leopard_4x4x512_ODS-8 153 ± 0% 153 ± 0% ~ (all equal)
ExtensionEncoding/Leopard_8x8x512_ODS-8 389 ± 0% 389 ± 0% ~ (all equal)
ExtensionEncoding/Leopard_16x16x512_ODS-8 1.15k ± 0% 1.15k ± 0% ~ (all equal)
ExtensionEncoding/Leopard_32x32x512_ODS-8 3.82k ± 0% 3.82k ± 0% ~ (all equal)
ExtensionEncoding/Leopard_64x64x512_ODS-8 13.8k ± 0% 13.8k ± 0% ~ (all equal)
ExtensionEncoding/Leopard_128x128x512_ODS-8 52.1k ± 0% 52.1k ± 0% ~ (p=0.353 n=10+7)
ExtensionEncoding/Leopard_256x256x512_ODS-8 201k ± 0% 201k ± 0% ~ (p=0.340 n=10+10)
ExtensionEncoding/Leopard_512x512x512_ODS-8 795k ± 0% 795k ± 0% ~ (p=0.154 n=9+9)
ExtensionWithRoots/Leopard_4x4x512_ODS-8 830 ± 0% 830 ± 0% ~ (all equal)
ExtensionWithRoots/Leopard_8x8x512_ODS-8 2.79k ± 0% 2.79k ± 0% ~ (all equal)
ExtensionWithRoots/Leopard_16x16x512_ODS-8 10.1k ± 0% 10.1k ± 0% ~ (all equal)
ExtensionWithRoots/Leopard_32x32x512_ODS-8 38.0k ± 0% 38.0k ± 0% ~ (all equal)
ExtensionWithRoots/Leopard_64x64x512_ODS-8 148k ± 0% 148k ± 0% ~ (all equal)
ExtensionWithRoots/Leopard_128x128x512_ODS-8 583k ± 0% 583k ± 0% ~ (p=0.894 n=10+10)
ExtensionWithRoots/Leopard_256x256x512_ODS-8 2.31M ± 0% 2.31M ± 0% ~ (p=0.617 n=10+9)
ExtensionWithRoots/Leopard_512x512x512_ODS-8 9.22M ± 0% 9.22M ± 0% ~ (p=0.443 n=9+9)
/cc @elias-orijtech @liamsi @rootulp @musalbas @staheri14
Agreed and it seem like the optimization you propose still works as expected.
The performance penalty comes at the cost of (in my subjective opinion) slightly less code readability so I think we should only implement this optimization if we need to.
I'm also puzzled by the few instances where benchmarking shows this optimization results in a positive delta. I.e:
ExtensionWithRoots/Leopard_32x32x512_ODS-8 6.10ms ± 4% 7.32ms ±23% +20.00% (p=0.000 n=10+9)
Repair/Leopard_64x64x512_ODS-8 32.0MB ± 5% 33.5MB ± 7% +4.51% (p=0.009 n=10+10)