rsmt2d icon indicating copy to clipboard operation
rsmt2d copied to clipboard

dataSquare.extendSquare can take advantage of loop reuse when building fillerExtendedRow and fillerRow

Open odeke-em opened this issue 1 year ago • 1 comments

While studying this code I noticed something that fillerRow is as long as ds.width+extendedWidth while fillerExtendedRow is as long as extendedWidth we could take advantage of loop reuse to build both fillerExtendedRow and fillerRow we can simply build them in the same loop and add a guard against the shorter length of extendedWidth

diff --git a/datasquare.go b/datasquare.go
index 0c16f59..5af113b 100644
--- a/datasquare.go
+++ b/datasquare.go
@@ -80,12 +80,12 @@ func (ds *dataSquare) extendSquare(extendedWidth uint, fillerChunk []byte) error
 	newSquareRow := make([][][]byte, newWidth)
 
 	fillerExtendedRow := make([][]byte, extendedWidth)
-	for i := uint(0); i < extendedWidth; i++ {
-		fillerExtendedRow[i] = fillerChunk
-	}
-
 	fillerRow := make([][]byte, newWidth)
 	for i := uint(0); i < newWidth; i++ {
+		if i < extendedWidth {
+			fillerExtendedRow[i] = fillerChunk
+		}
 		fillerRow[i] = fillerChunk
 	}

which produces interesting results

name                                          old time/op    new time/op    delta
Encoding/Leopard_128_shares_512-8               39.8µs ± 5%    39.0µs ± 3%     ~     (p=0.165 n=10+10)
Decoding/Leopard_128_shares_512-8                422ns ± 3%     424ns ± 4%     ~     (p=0.529 n=10+10)
EDSRoots/32x32x512_ODS-8                        5.92ms ± 4%    5.92ms ± 4%     ~     (p=0.853 n=10+10)
EDSRoots/64x64x512_ODS-8                        22.3ms ± 3%    22.2ms ± 2%     ~     (p=0.661 n=10+9)
EDSRoots/128x128x512_ODS-8                       106ms ±29%      87ms ± 4%  -17.51%  (p=0.022 n=9+10)
EDSRoots/256x256x512_ODS-8                       380ms ± 1%     350ms ± 3%   -7.90%  (p=0.000 n=8+9)
EDSRoots/512x512x512_ODS-8                       1.55s ±11%     1.35s ± 2%  -12.69%  (p=0.000 n=9+9)
Repair/Leopard_4x4x512_ODS-8                     370µs ± 6%     344µs ± 2%   -7.09%  (p=0.000 n=10+9)
Repair/Leopard_8x8x512_ODS-8                    1.37ms ± 3%    1.39ms ± 7%     ~     (p=0.315 n=9+10)
Repair/Leopard_16x16x512_ODS-8                  5.56ms ± 2%    5.43ms ± 1%   -2.23%  (p=0.000 n=10+10)
Repair/Leopard_32x32x512_ODS-8                  23.4ms ± 4%    22.3ms ± 4%   -4.90%  (p=0.001 n=9+9)
Repair/Leopard_64x64x512_ODS-8                  93.8ms ± 2%    89.5ms ± 2%   -4.60%  (p=0.000 n=9+10)
Repair/Leopard_128x128x512_ODS-8                 387ms ±16%     372ms ± 7%     ~     (p=0.912 n=10+10)
Repair/Leopard_256x256x512_ODS-8                 2.52s ± 5%     2.47s ± 2%     ~     (p=0.481 n=10+10)
Repair/Leopard_512x512x512_ODS-8                 8.93s ± 5%     8.66s ± 4%   -3.03%  (p=0.029 n=10+10)
ExtensionEncoding/Leopard_4x4x512_ODS-8         24.0µs ±10%    24.4µs ± 7%     ~     (p=0.190 n=10+10)
ExtensionEncoding/Leopard_8x8x512_ODS-8         61.9µs ±14%    59.2µs ± 4%     ~     (p=0.075 n=10+10)
ExtensionEncoding/Leopard_16x16x512_ODS-8        154µs ± 8%     148µs ± 3%   -4.23%  (p=0.010 n=10+9)
ExtensionEncoding/Leopard_32x32x512_ODS-8        474µs ± 5%     475µs ± 3%     ~     (p=0.400 n=9+10)
ExtensionEncoding/Leopard_64x64x512_ODS-8       1.74ms ± 9%    1.64ms ± 6%   -5.63%  (p=0.008 n=9+9)
ExtensionEncoding/Leopard_128x128x512_ODS-8     7.96ms ± 5%    7.04ms ± 1%  -11.56%  (p=0.000 n=7+10)
ExtensionEncoding/Leopard_256x256x512_ODS-8     40.6ms ±13%    36.9ms ±11%   -9.21%  (p=0.043 n=10+9)
ExtensionEncoding/Leopard_512x512x512_ODS-8      164ms ± 4%     164ms ± 5%     ~     (p=1.000 n=8+8)
ExtensionWithRoots/Leopard_4x4x512_ODS-8         171µs ± 6%     170µs ± 2%     ~     (p=0.604 n=10+9)
ExtensionWithRoots/Leopard_8x8x512_ODS-8         531µs ±10%     532µs ± 8%     ~     (p=0.739 n=10+10)
ExtensionWithRoots/Leopard_16x16x512_ODS-8      1.72ms ± 3%    1.79ms ±11%     ~     (p=0.133 n=10+9)
ExtensionWithRoots/Leopard_32x32x512_ODS-8      6.10ms ± 4%    7.32ms ±23%  +20.00%  (p=0.000 n=10+9)
ExtensionWithRoots/Leopard_64x64x512_ODS-8      23.6ms ± 2%    24.0ms ± 3%     ~     (p=0.136 n=9+9)
ExtensionWithRoots/Leopard_128x128x512_ODS-8    97.2ms ± 9%    96.4ms ± 8%     ~     (p=0.529 n=10+10)
ExtensionWithRoots/Leopard_256x256x512_ODS-8     459ms ±11%     399ms ± 5%  -13.06%  (p=0.000 n=10+9)
ExtensionWithRoots/Leopard_512x512x512_ODS-8     1.79s ± 4%     1.58s ± 4%  -12.04%  (p=0.000 n=9+8)

name                                          old alloc/op   new alloc/op   delta
Encoding/Leopard_128_shares_512-8                120kB ± 2%     118kB ± 2%   -1.58%  (p=0.043 n=10+10)
Decoding/Leopard_128_shares_512-8                0.00B          0.00B          ~     (all equal)
EDSRoots/32x32x512_ODS-8                        1.81MB ± 0%    1.81MB ± 0%     ~     (p=0.753 n=10+10)
EDSRoots/64x64x512_ODS-8                        6.24MB ± 0%    6.24MB ± 0%     ~     (p=1.000 n=9+10)
EDSRoots/128x128x512_ODS-8                      26.5MB ± 0%    26.5MB ± 0%     ~     (p=0.833 n=10+10)
EDSRoots/256x256x512_ODS-8                       109MB ± 0%     109MB ± 0%     ~     (p=0.435 n=9+9)
EDSRoots/512x512x512_ODS-8                       497MB ± 0%     497MB ± 0%     ~     (p=0.467 n=8+8)
Repair/Leopard_4x4x512_ODS-8                     163kB ± 1%     164kB ± 1%     ~     (p=0.190 n=10+10)
Repair/Leopard_8x8x512_ODS-8                     521kB ± 4%     526kB ± 4%     ~     (p=0.353 n=10+10)
Repair/Leopard_16x16x512_ODS-8                  1.89MB ± 9%    1.89MB ±10%     ~     (p=0.912 n=10+10)
Repair/Leopard_32x32x512_ODS-8                  7.68MB ± 5%    7.79MB ± 8%     ~     (p=0.280 n=10+10)
Repair/Leopard_64x64x512_ODS-8                  32.0MB ± 5%    33.5MB ± 7%   +4.51%  (p=0.009 n=10+10)
Repair/Leopard_128x128x512_ODS-8                 131MB ± 7%     133MB ±11%     ~     (p=0.280 n=10+10)
Repair/Leopard_256x256x512_ODS-8                 440MB ± 1%     440MB ± 0%     ~     (p=0.912 n=10+10)
Repair/Leopard_512x512x512_ODS-8                1.72GB ± 0%    1.72GB ± 0%     ~     (p=0.965 n=8+10)
ExtensionEncoding/Leopard_4x4x512_ODS-8         38.3kB ± 0%    38.3kB ± 0%     ~     (p=0.529 n=10+10)
ExtensionEncoding/Leopard_8x8x512_ODS-8          147kB ± 0%     147kB ± 1%     ~     (p=0.113 n=10+9)
ExtensionEncoding/Leopard_16x16x512_ODS-8        614kB ± 1%     612kB ± 0%     ~     (p=0.436 n=10+10)
ExtensionEncoding/Leopard_32x32x512_ODS-8       2.51MB ± 1%    2.52MB ± 1%     ~     (p=0.247 n=10+10)
ExtensionEncoding/Leopard_64x64x512_ODS-8       10.3MB ± 2%    10.2MB ± 1%     ~     (p=0.447 n=10+9)
ExtensionEncoding/Leopard_128x128x512_ODS-8     42.6MB ± 3%    42.9MB ± 3%     ~     (p=0.258 n=9+9)
ExtensionEncoding/Leopard_256x256x512_ODS-8      127MB ± 0%     127MB ± 0%     ~     (p=0.529 n=10+10)
ExtensionEncoding/Leopard_512x512x512_ODS-8      512MB ± 0%     512MB ± 0%     ~     (p=0.971 n=10+10)
ExtensionWithRoots/Leopard_4x4x512_ODS-8         119kB ± 0%     119kB ± 0%     ~     (p=0.404 n=10+10)
ExtensionWithRoots/Leopard_8x8x512_ODS-8         356kB ± 1%     357kB ± 1%     ~     (p=0.143 n=10+10)
ExtensionWithRoots/Leopard_16x16x512_ODS-8      1.20MB ± 2%    1.20MB ± 1%     ~     (p=0.475 n=10+7)
ExtensionWithRoots/Leopard_32x32x512_ODS-8      4.36MB ± 2%    4.34MB ± 2%     ~     (p=0.673 n=8+9)
ExtensionWithRoots/Leopard_64x64x512_ODS-8      16.5MB ± 5%    16.6MB ± 3%     ~     (p=0.604 n=10+9)
ExtensionWithRoots/Leopard_128x128x512_ODS-8    73.4MB ±19%    77.8MB ±16%     ~     (p=0.089 n=10+10)
ExtensionWithRoots/Leopard_256x256x512_ODS-8     236MB ± 0%     237MB ± 0%     ~     (p=0.280 n=10+10)
ExtensionWithRoots/Leopard_512x512x512_ODS-8    1.01GB ± 0%    1.01GB ± 0%     ~     (p=0.159 n=9+8)

name                                          old allocs/op  new allocs/op  delta
Encoding/Leopard_128_shares_512-8                  132 ± 0%       132 ± 0%     ~     (all equal)
Decoding/Leopard_128_shares_512-8                 0.00           0.00          ~     (all equal)
EDSRoots/32x32x512_ODS-8                         34.1k ± 0%     34.1k ± 0%     ~     (p=1.000 n=10+10)
EDSRoots/64x64x512_ODS-8                          134k ± 0%      134k ± 0%     ~     (p=0.550 n=10+10)
EDSRoots/128x128x512_ODS-8                        530k ± 0%      530k ± 0%     ~     (p=0.248 n=10+8)
EDSRoots/256x256x512_ODS-8                       2.11M ± 0%     2.11M ± 0%     ~     (p=0.480 n=10+10)
EDSRoots/512x512x512_ODS-8                       8.42M ± 0%     8.42M ± 0%     ~     (p=0.467 n=8+8)
Repair/Leopard_4x4x512_ODS-8                       772 ± 0%       772 ± 0%     ~     (all equal)
Repair/Leopard_8x8x512_ODS-8                     2.86k ± 0%     2.86k ± 0%     ~     (p=0.802 n=9+9)
Repair/Leopard_16x16x512_ODS-8                   10.9k ± 0%     10.9k ± 0%     ~     (p=0.698 n=10+10)
Repair/Leopard_32x32x512_ODS-8                   42.2k ± 0%     42.2k ± 0%     ~     (p=0.433 n=10+9)
Repair/Leopard_64x64x512_ODS-8                    166k ± 0%      166k ± 0%     ~     (p=0.985 n=10+10)
Repair/Leopard_128x128x512_ODS-8                  660k ± 0%      660k ± 0%     ~     (p=0.725 n=10+10)
Repair/Leopard_256x256x512_ODS-8                 2.64M ± 0%     2.63M ± 0%     ~     (p=0.853 n=10+10)
Repair/Leopard_512x512x512_ODS-8                 10.5M ± 0%     10.5M ± 0%     ~     (p=0.965 n=8+10)
ExtensionEncoding/Leopard_4x4x512_ODS-8            153 ± 0%       153 ± 0%     ~     (all equal)
ExtensionEncoding/Leopard_8x8x512_ODS-8            389 ± 0%       389 ± 0%     ~     (all equal)
ExtensionEncoding/Leopard_16x16x512_ODS-8        1.15k ± 0%     1.15k ± 0%     ~     (all equal)
ExtensionEncoding/Leopard_32x32x512_ODS-8        3.82k ± 0%     3.82k ± 0%     ~     (all equal)
ExtensionEncoding/Leopard_64x64x512_ODS-8        13.8k ± 0%     13.8k ± 0%     ~     (all equal)
ExtensionEncoding/Leopard_128x128x512_ODS-8      52.1k ± 0%     52.1k ± 0%     ~     (p=0.353 n=10+7)
ExtensionEncoding/Leopard_256x256x512_ODS-8       201k ± 0%      201k ± 0%     ~     (p=0.340 n=10+10)
ExtensionEncoding/Leopard_512x512x512_ODS-8       795k ± 0%      795k ± 0%     ~     (p=0.154 n=9+9)
ExtensionWithRoots/Leopard_4x4x512_ODS-8           830 ± 0%       830 ± 0%     ~     (all equal)
ExtensionWithRoots/Leopard_8x8x512_ODS-8         2.79k ± 0%     2.79k ± 0%     ~     (all equal)
ExtensionWithRoots/Leopard_16x16x512_ODS-8       10.1k ± 0%     10.1k ± 0%     ~     (all equal)
ExtensionWithRoots/Leopard_32x32x512_ODS-8       38.0k ± 0%     38.0k ± 0%     ~     (all equal)
ExtensionWithRoots/Leopard_64x64x512_ODS-8        148k ± 0%      148k ± 0%     ~     (all equal)
ExtensionWithRoots/Leopard_128x128x512_ODS-8      583k ± 0%      583k ± 0%     ~     (p=0.894 n=10+10)
ExtensionWithRoots/Leopard_256x256x512_ODS-8     2.31M ± 0%     2.31M ± 0%     ~     (p=0.617 n=10+9)
ExtensionWithRoots/Leopard_512x512x512_ODS-8     9.22M ± 0%     9.22M ± 0%     ~     (p=0.443 n=9+9)

/cc @elias-orijtech @liamsi @rootulp @musalbas @staheri14

odeke-em avatar Mar 05 '24 16:03 odeke-em

Agreed and it seem like the optimization you propose still works as expected.

The performance penalty comes at the cost of (in my subjective opinion) slightly less code readability so I think we should only implement this optimization if we need to.

I'm also puzzled by the few instances where benchmarking shows this optimization results in a positive delta. I.e:

ExtensionWithRoots/Leopard_32x32x512_ODS-8      6.10ms ± 4%    7.32ms ±23%  +20.00%  (p=0.000 n=10+9)

Repair/Leopard_64x64x512_ODS-8                  32.0MB ± 5%    33.5MB ± 7%   +4.51%  (p=0.009 n=10+10)

rootulp avatar Mar 05 '24 20:03 rootulp