futhark icon indicating copy to clipboard operation
futhark copied to clipboard

Short circuiting

Open Munksgaard opened this issue 3 years ago • 2 comments

Munksgaard avatar Jul 14 '22 14:07 Munksgaard

Is this supposed to become Z3-free?

athas avatar Jul 14 '22 15:07 athas

Yes

Munksgaard avatar Jul 14 '22 15:07 Munksgaard

Everything has been merged!

This whole thing is a big ugly hairball and I am not proud of it. At some point I want to do a big rewrite: I think simplifying the pass to only work on one short-circuiting opportunity at a time (which is also how the algorithm is described in our paper) will do a lot to clean this code up. However, realistically, that is not going to happen anytime soon.

What I will do is to go over this PR and clean out the most obviously egregious bits (old comments, commented-out code and so on). After that is done, I suggest we merge the PR and hope that I can come back to it in the future. If all else fails, it shouldn't be hard to disable or remove the short-circuiting pass again at a later point.

I'll let you know once I've done some house cleaning. I might save it for next week.

Munksgaard avatar Oct 11 '22 11:10 Munksgaard

Sounds good! I know we took out the flashiest Z3-based parts, but I'm curious whether we'll see speedups on other benchmarks.

athas avatar Oct 11 '22 11:10 athas

Sounds good! I know we took out the flashiest Z3-based parts, but I'm curious whether we'll see speedups on other benchmarks.

I'll try it out on hpa01

Munksgaard avatar Oct 11 '22 11:10 Munksgaard

Here's the impact of short-circuiting on the A100 with 1000 runs

./accelerate/canny/canny.fut
  data/lena256.in:                                                      0.94x
  data/lena512.in:                                                      0.97x

./accelerate/crystal/crystal.fut
  #0 ("200i32 30.0f32 5i32 1i32 1.0f32"):                               0.91x
  #4 ("2000i32 30.0f32 50i32 1i32 1.0f32"):                             1.00x
  #5 ("4000i32 30.0f32 50i32 1i32 1.0f32"):                             1.00x

./accelerate/fft/fft.fut
  data/1024x1024.in:                                                    0.96x
  data/128x128.in:                                                      1.27x
  data/128x512.in:                                                      0.91x
  data/256x256.in:                                                      0.96x
  data/512x512.in:                                                      1.02x
  data/64x256.in:                                                       0.93x

./accelerate/fluid/fluid.fut
  benchmarking/medium.in:                                               0.99x

./accelerate/hashcat/hashcat.fut
  rockyou.dataset:                                                      1.00x

./accelerate/kmeans/kmeans.fut
  data/k5_n200000.in:                                                   1.01x
  data/k5_n50000.in:                                                    0.85x
  data/trivial.in:                                                      0.95x

./accelerate/mandelbrot/mandelbrot.fut
  #0 ("800i32 600i32 -0.7f32 0.0f32 3.067f32 100i32 16.0f..."):         0.99x
  #1 ("1000i32 1000i32 -0.7f32 0.0f32 3.067f32 100i32 16...."):         0.79x
  #2 ("2000i32 2000i32 -0.7f32 0.0f32 3.067f32 100i32 16...."):         0.95x
  #3 ("4000i32 4000i32 -0.7f32 0.0f32 3.067f32 100i32 16...."):         1.00x
  #4 ("8000i32 8000i32 -0.7f32 0.0f32 3.067f32 100i32 16...."):         1.00x

./accelerate/nbody/nbody-bh.fut
  data/1000-bodies.in:                                                  0.81x
  data/10000-bodies.in:                                                 0.86x
  data/100000-bodies.in:                                                0.92x

./accelerate/nbody/nbody.fut
  data/1000-bodies.in:                                                  0.77x
  data/10000-bodies.in:                                                 1.00x
  data/100000-bodies.in:                                                1.00x

./accelerate/pagerank/pagerank.fut
  data/random_medium.in:                                                1.00x
  data/small.in:                                                        0.97x

./accelerate/ray/trace.fut
  #0 ("800i32 600i32 100i32 50.0f32 -100.0f32 -700.0f32 1..."):         0.88x

./accelerate/smoothlife/smoothlife.fut
  #0 ("128i32"):                                                        0.95x
  #1 ("256i32"):                                                        1.01x
  #2 ("512i32"):                                                        1.00x
  #3 ("1024i32"):                                                       1.00x

./accelerate/tunnel/tunnel.fut
  #0 ("10.0f32 800i32 600i32"):                                         0.99x
  #1 ("10.0f32 1000i32 1000i32"):                                       1.03x
  #2 ("10.0f32 2000i32 2000i32"):                                       1.00x
  #3 ("10.0f32 4000i32 4000i32"):                                       1.00x
  #4 ("10.0f32 8000i32 8000i32"):                                       1.00x

./finpar/LocVolCalib.fut
  LocVolCalib-data/large.in:                                            1.00x
  LocVolCalib-data/medium.in:                                           1.00x
  LocVolCalib-data/small.in:                                            0.84x

./finpar/OptionPricing.fut
  OptionPricing-data/large.in:                                          1.19x (mem: 0.94x@device)
  OptionPricing-data/medium.in:                                         1.07x (mem: 0.99x@device)
  OptionPricing-data/small.in:                                          1.10x

./jgf/crypt/crypt.fut
  crypt-data/medium.in:                                                 0.91x

./jgf/crypt/keys.fut
  crypt-data/userkey0.txt:                                              1.68x (mem: 0.97x@device)

./jgf/series/series.fut
  data/10000.in:                                                        0.99x (mem: 0.50x@device)
  data/100000.in:                                                       1.00x (mem: 0.50x@device)
  data/1000000.in:                                                      1.00x (mem: 0.50x@device)

./misc/bfast/bfast-cloudy.fut
  data/africa.in:                                                       1.00x
  data/peru.in:                                                         1.00x
  data/sahara-cloudy.in:                                                1.00x

./misc/bfast/bfast.fut
  data/sahara.in:                                                       1.00x

./misc/heston/heston32.fut
  data/100000_quotes.in:                                                0.99x
  data/10000_quotes.in:                                                 0.92x
  data/1062_quotes.in:                                                  1.01x

./misc/heston/heston64.fut
  data/100000_quotes.in:                                                1.00x
  data/10000_quotes.in:                                                 1.01x
  data/1062_quotes.in:                                                  0.93x

./misc/knn-by-kdtree/buildKDtree.fut
  valid-data/kdtree-ppl-32-m-2097152.in:                                1.00x

./misc/knn-by-kdtree/driver-knn.fut
  256i32 [2097152][7]f32 [10000000][7]f32:                              1.03x

./misc/ocean-sim/tke.fut
  [200][200][100]f32 [200][200][100]f32 [200][200][100]f32 [...]:       2.27x
  data/tke32-small.in:                                                  1.02x

./misc/ocean-sim/tridiag-test.fut:tridagNested
  [57600][115]f32 [57600][115]f32 [57600][115]f32 [57600][115]f32:      1.00x
  data/tridiag32-small.in:                                              1.08x

./misc/ocean-sim/tridiag-test.fut:tridagNestedConst
  [57600][115]f32 [57600][115]f32 [57600][115]f32 [57600][115]f32:      1.00x
  data/tridiag32-small.in:                                              1.02x

./misc/ocean-sim/tridiag-test.fut:tridagNestedSeq
  [57600][115]f32 [57600][115]f32 [57600][115]f32 [57600][115]f32:      1.10x (mem: 0.91x@device)
  data/tridiag32-small.in:                                              1.04x (mem: 0.82x@device)

./misc/ocean-sim/tridiag-test.fut:tridagNestedSeqConst
  [57600][115]f32 [57600][115]f32 [57600][115]f32 [57600][115]f32:      1.11x
  data/tridiag32-small.in:                                              1.13x

./misc/poseidon/poseidon-bench.fut:arity11
  [17600000]u64:                                                        1.00x

./misc/poseidon/poseidon-bench.fut:arity8
  [22400000]u64:                                                        1.00x

./misc/radix_sort/radix_sort_blelloch_benchmark.fut
  data/radix_sort_100K.in:                                              0.92x
  data/radix_sort_10K.in:                                               0.72x
  data/radix_sort_1M.in:                                                1.02x

./misc/radix_sort/radix_sort_large.fut
  data/radix_sort_100K.in:                                              0.79x
  data/radix_sort_10K.in:                                               0.99x
  data/radix_sort_1M.in:                                                1.00x

./parboil/histo/histo.fut
  data/default.in:                                                      1.00x
  data/large.in:                                                        1.00x

./parboil/lbm/lbm.fut
  data/120_120_150_ldc.in:                                              1.00x

./parboil/mri-q/mri-q.fut
  data/large.in:                                                        1.00x
  data/small.in:                                                        1.00x

./parboil/sgemm/sgemm.fut
  data/medium.in:                                                       1.00x
  data/small.in:                                                        1.00x
  data/tiny.in:                                                         1.03x

./parboil/stencil/stencil.fut
  data/default.in:                                                      1.00x
  data/small.in:                                                        0.99x

./parboil/tpacf/tpacf.fut
  data/large.in:                                                        1.00x
  data/medium.in:                                                       1.00x
  data/small.in:                                                        1.02x

./pbbs/ray/ray.fut
  data/angel.in:                                                        1.00x
  data/dragon.in:                                                       1.00x
  data/happy.in:                                                        1.00x

./rodinia/backprop/backprop.fut
  data/medium.in:                                                       1.00x
  data/small.in:                                                        1.07x

./rodinia/bfs/bfs_asympt_ok_but_slow.fut
  data/4096nodes.in:                                                    1.03x
  data/512nodes_high_edge_variance.in:                                  1.06x
  data/64kn_32e-var-1-256-skew.in:                                      1.01x
  data/graph1MW_6.in:                                                   0.94x

./rodinia/bfs/bfs_filt_padded_fused.fut
  data/4096nodes.in:                                                    1.09x
  data/512nodes_high_edge_variance.in:                                  1.07x
  data/64kn_32e-var-1-256-skew.in:                                      1.01x
  data/graph1MW_6.in:                                                   1.05x

./rodinia/bfs/bfs_heuristic.fut
  data/4096nodes.in:                                                    1.07x
  data/512nodes_high_edge_variance.in:                                  1.09x
  data/64kn_32e-var-1-256-skew.in:                                      1.02x
  data/graph1MW_6.in:                                                   1.00x

./rodinia/bfs/bfs_iter_work_ok.fut
  data/4096nodes.in:                                                    1.07x
  data/512nodes_high_edge_variance.in:                                  1.04x
  data/64kn_32e-var-1-256-skew.in:                                      1.03x
  data/graph1MW_6.in:                                                   1.04x

./rodinia/cfd/cfd.fut
  data/fvcorr.domn.097K.toa:                                            1.00x
  data/fvcorr.domn.193K.toa:                                            1.00x

./rodinia/hotspot/hotspot.fut
  data/1024.in:                                                         1.01x
  data/512.in:                                                          1.27x
  data/64.in:                                                           1.08x

./rodinia/kmeans/kmeans.fut
  data/100.in:                                                          1.06x
  data/204800.in:                                                       1.05x
  data/kdd_cup.in:                                                      1.02x

./rodinia/lavaMD/lavaMD.fut
  data/10_boxes.in:                                                     1.01x
  data/3_boxes.in:                                                      1.01x

./rodinia/lud/lud-clean.fut
  [128][128][32][32]f32:                                                0.99x
  data/b32.in:                                                          0.68x

./rodinia/lud/lud-internal.fut
  [128][128][16][16]f32:                                                1.01x

./rodinia/lud/lud.fut
  data/16by16.in:                                                       1.30x
  data/2048.in:                                                         0.98x
  data/256.in:                                                          0.99x
  data/512.in:                                                          0.98x
  data/64.in:                                                           0.98x

./rodinia/myocyte/myocyte.fut
  data/medium.in:                                                       1.26x
  data/small.in:                                                        1.21x

./rodinia/nn/nn.fut
  data/medium.in:                                                       1.18x

./rodinia/nw/nw-cosmin.fut:nw_flat
  #3 ("3i64 10i32 [4i32, 2i32, 4i32, 9i32, 2i32, 1i32, 7i..."):         0.89x
  16i64 10i32 [2362369]i32 [2362369]i32:                                0.59x
  32i64 10i32 [2362369]i32 [2362369]i32:                                0.64x
  64i64 10i32 [2362369]i32 [2362369]i32:                                0.63x

./rodinia/nw/nw.fut
  data/large.in:                                                        0.70x
  data/medium.in:                                                       0.69x
  data/small.in:                                                        0.71x
  data/tiny.in:                                                         0.72x

./rodinia/particlefilter/particlefilter.fut
  data/128_128_10_image_10000_particles.in:                             1.03x
  data/128_128_10_image_400000_particles.in:                            1.00x

./rodinia/pathfinder/pathfinder.fut
  data/medium.in:                                                       0.82x

./rodinia/srad/srad.fut
  data/image.in:                                                        1.17x

./rsbench/rsbench.fut
  data/large.in:                                                        1.00x
  data/small.in:                                                        1.00x

./xsbench/xsbench.fut
  data/large.in:                                                        1.00x
  data/small.in:                                                        1.00x

The impact on OptionPricing, myocyte and nn are reproducible. I'm not sure what's going on with NW...

Munksgaard avatar Oct 17 '22 13:10 Munksgaard

@athas I have performed some superficial cleaning. It's still not pretty, but I think it's important to get it merged before master diverges even further. Please take a look and let me know if there are any egregious things you want me to change.

Munksgaard avatar Oct 19 '22 08:10 Munksgaard

Many of the tests have no input/output pairs and no structure stanza. I get that means they check if the compiler crashes, but my experience is that such tests are pretty fragile. I suggest adding structure unless absence of crashes is a major feat for those tests.

athas avatar Oct 19 '22 09:10 athas

I have added structure tests to all new tests.

Munksgaard avatar Oct 20 '22 09:10 Munksgaard

Differences in compile time seem minor and will hopefully be offset by future improvements elsewhere. Let's just merge it and see what happens.

athas avatar Oct 31 '22 16:10 athas

@Munksgaard Do you fancy writing an approachable blog post on futhark-lang.org for this optimisation?

athas avatar Oct 31 '22 16:10 athas

@Munksgaard Do you fancy writing an approachable blog post on futhark-lang.org for this optimisation?

Yes. Hopefully I'll be able to get around to it later this week.

Munksgaard avatar Oct 31 '22 17:10 Munksgaard