pyspharm icon indicating copy to clipboard operation
pyspharm copied to clipboard

BENCH: set up benchmarks for the core transforms.

Open DWesl opened this issue 1 year ago • 3 comments

I was poking at the idea of modernizing the Fortran code, and there's a chance it could affect performance. There's a decent chance the compilers see if-then-else the same way as a block of gotos, but if not the more structured code might be easier to optimize.

Benchmarks based on airspeed velocity. Run directions and results below.

$ python -m asv run
· Creating environments
· Discovering benchmarks..
·· Uninstalling from virtualenv-py3.9.
·· Building 182d0348 <patch-2> for virtualenv-py3.9....
·· Installing 182d0348 <patch-2> into virtualenv-py3.9....
· Running 2 total benchmarks (1 commits * 1 environments * 2 benchmarks)
[ 0.00%] · For pyspharm commit 182d0348 <patch-2>:
[ 0.00%] ·· Benchmarking virtualenv-py3.9
[25.00%] ··· Running (benchmarks.TimeSuite.time_grdtospec--).
[50.00%] ··· Running (benchmarks.TimeSuite.time_spectogrd--).
[75.00%] ··· benchmarks.TimeSuite.time_grdtospec                                                                       ok
[75.00%] ··· ======== ============ ============
             --                 method
             -------- -------------------------
              ntrunc    computed      stored
             ======== ============ ============
                21      97.2±4μs     72.6±7μs
                42      366±30μs     286±30μs
                63      939±80μs    713±300μs
                84     3.28±0.7ms   2.96±0.5ms
               127      8.89±1ms     6.35±2ms
               168      33.2±1ms     35.0±8ms
               252      90.8±5ms     102±10ms
             ======== ============ ============

[100.00%] ··· benchmarks.TimeSuite.time_spectogrd                                                                     ok
[100.00%] ··· ======== ============ ============
              --                 method
              -------- -------------------------
               ntrunc    computed      stored
              ======== ============ ============
                 21     115±100μs    85.2±20μs
                 42      249±40μs     184±20μs
                 63      759±90μs     448±20μs
                 84     1.47±0.2ms   1.00±0.1ms
                127     5.67±0.3ms   4.63±0.2ms
                168      10.6±1ms     11.2±2ms
                252      30.9±3ms    27.7±10ms
              ======== ============ ============

DWesl avatar Aug 31 '24 14:08 DWesl

Re-run of the benchmarks after passing complex coefficients to spectogrd. The 20% variation in the untouched grdtospec benchmark is apparently normal, so this would be useful primarily for finding large changes from one commit to the next, rather than tracking changes over time.

[75.00%] ··· benchmarks.TimeSuite.time_grdtospec                                                                      ok
[75.00%] ··· ======== ============= =============
             --                  method
             -------- ---------------------------
              ntrunc     computed       stored
             ======== ============= =============
                21       78.9±8μs      62.9±2μs
                42       296±9μs       228±2μs
                63      664±200μs     586±100μs
                84     2.88±0.04ms   2.70±0.04ms
               127      6.40±0.5ms    5.36±0.4ms
               168      28.1±0.7ms    27.2±0.5ms
               252       72.0±2ms      73.1±4ms
             ======== ============= =============

[100.00%] ··· benchmarks.TimeSuite.time_spectogrd                                                                    ok
[100.00%] ··· ======== ============ =============
              --                 method
              -------- --------------------------
               ntrunc    computed       stored
              ======== ============ =============
                 21      64.8±2μs     47.5±0.9μs
                 42      214±3μs       135±8μs
                 63      529±80μs      348±10μs
                 84     1.21±0.3ms     752±80μs
                127     4.73±0.3ms   3.87±0.02ms
                168     8.86±0.9ms    8.07±0.2ms
                252      27.1±1ms      23.4±1ms
              ======== ============ =============

DWesl avatar Sep 07 '24 16:09 DWesl

Thanks for all your recent contributions @DWesl! Not sure how I feel about touching the ancient NCAR fortran code though - seems like a slippery slope. The only reason I've found to use this instead of more modern libs (like SHTns) is the ability of SPHEREPACK to use a regularly spaced latitude grid (including the poles) without doubling the number of latitudes.

jswhit avatar Sep 21 '24 19:09 jswhit

Not sure how I feel about touching the ancient NCAR fortran code though - seems like a slippery slope.

My first thought was to wrap the files not exposed to python in Fortran 90 modules, but that broke the Python binding, so I'm inclined to agree with you.

The only reason I've found to use this instead of more modern libs (like SHTns) is the ability of SPHEREPACK to use a regularly spaced latitude grid (including the poles) without doubling the number of latitudes.

I think there's one other, which uses Clenshaw-Curtis quadrature rather than Gaussian for that integration, but it seems it upscales the grid internally as well.

I think there's a paper some while back showing exact transforms on a regular lat-lon grid can't happen without 2N latitudes, but frequently what we can get with N+1 latitudes is fine. (There's other papers investigating similar things with how well Clenshaw-Curtis quadrature deals with polynomials beyond the degree for which it is exact, compared to Gaussian quadrature, and extending that to certain rational functions). As partial evidence for that, the fast Legendre transform in the IFS is also approximate.

DWesl avatar Sep 23 '24 15:09 DWesl

I was poking at the idea of modernizing the Fortran code, and there's a chance it could affect performance. There's a decent chance the compilers see if-then-else the same way as a block of gotos, but if not the more structured code might be easier to optimize.

Benchmarks based on airspeed velocity. Run directions and results below.

$ python -m asv run
· Creating environments
· Discovering benchmarks..
·· Uninstalling from virtualenv-py3.9.
·· Building 182d0348 <patch-2> for virtualenv-py3.9....
·· Installing 182d0348 <patch-2> into virtualenv-py3.9....
· Running 2 total benchmarks (1 commits * 1 environments * 2 benchmarks)
[ 0.00%] · For pyspharm commit 182d0348 <patch-2>:
[ 0.00%] ·· Benchmarking virtualenv-py3.9
[25.00%] ··· Running (benchmarks.TimeSuite.time_grdtospec--).
[50.00%] ··· Running (benchmarks.TimeSuite.time_spectogrd--).
[75.00%] ··· benchmarks.TimeSuite.time_grdtospec                                                                       ok
[75.00%] ··· ======== ============ ============
             --                 method
             -------- -------------------------
              ntrunc    computed      stored
             ======== ============ ============
                21      97.2±4μs     72.6±7μs
                42      366±30μs     286±30μs
                63      939±80μs    713±300μs
                84     3.28±0.7ms   2.96±0.5ms
               127      8.89±1ms     6.35±2ms
               168      33.2±1ms     35.0±8ms
               252      90.8±5ms     102±10ms
             ======== ============ ============

[100.00%] ··· benchmarks.TimeSuite.time_spectogrd                                                                     ok
[100.00%] ··· ======== ============ ============
              --                 method
              -------- -------------------------
               ntrunc    computed      stored
              ======== ============ ============
                 21     115±100μs    85.2±20μs
                 42      249±40μs     184±20μs
                 63      759±90μs     448±20μs
                 84     1.47±0.2ms   1.00±0.1ms
                127     5.67±0.3ms   4.63±0.2ms
                168      10.6±1ms     11.2±2ms
                252      30.9±3ms    27.7±10ms
              ======== ============ ============

@DWesl Hi, I've just rewritten the .f files from pyspharm into modern Fortran and added OpenMP SIMD directives to all the hot loops. My tests show a 10–20 % speed-up. If you're interested, take a look at the spharm submodule in my Skyborn repo:https://github.com/QianyeSu/Skyborn

QianyeSu avatar Aug 17 '25 04:08 QianyeSu