pyspharm
pyspharm copied to clipboard
BENCH: set up benchmarks for the core transforms.
I was poking at the idea of modernizing the Fortran code, and there's a chance it could affect performance. There's a decent chance the compilers see if-then-else the same way as a block of gotos, but if not the more structured code might be easier to optimize.
Benchmarks based on airspeed velocity. Run directions and results below.
$ python -m asv run
· Creating environments
· Discovering benchmarks..
·· Uninstalling from virtualenv-py3.9.
·· Building 182d0348 <patch-2> for virtualenv-py3.9....
·· Installing 182d0348 <patch-2> into virtualenv-py3.9....
· Running 2 total benchmarks (1 commits * 1 environments * 2 benchmarks)
[ 0.00%] · For pyspharm commit 182d0348 <patch-2>:
[ 0.00%] ·· Benchmarking virtualenv-py3.9
[25.00%] ··· Running (benchmarks.TimeSuite.time_grdtospec--).
[50.00%] ··· Running (benchmarks.TimeSuite.time_spectogrd--).
[75.00%] ··· benchmarks.TimeSuite.time_grdtospec ok
[75.00%] ··· ======== ============ ============
-- method
-------- -------------------------
ntrunc computed stored
======== ============ ============
21 97.2±4μs 72.6±7μs
42 366±30μs 286±30μs
63 939±80μs 713±300μs
84 3.28±0.7ms 2.96±0.5ms
127 8.89±1ms 6.35±2ms
168 33.2±1ms 35.0±8ms
252 90.8±5ms 102±10ms
======== ============ ============
[100.00%] ··· benchmarks.TimeSuite.time_spectogrd ok
[100.00%] ··· ======== ============ ============
-- method
-------- -------------------------
ntrunc computed stored
======== ============ ============
21 115±100μs 85.2±20μs
42 249±40μs 184±20μs
63 759±90μs 448±20μs
84 1.47±0.2ms 1.00±0.1ms
127 5.67±0.3ms 4.63±0.2ms
168 10.6±1ms 11.2±2ms
252 30.9±3ms 27.7±10ms
======== ============ ============
Re-run of the benchmarks after passing complex coefficients to spectogrd. The 20% variation in the untouched grdtospec benchmark is apparently normal, so this would be useful primarily for finding large changes from one commit to the next, rather than tracking changes over time.
[75.00%] ··· benchmarks.TimeSuite.time_grdtospec ok
[75.00%] ··· ======== ============= =============
-- method
-------- ---------------------------
ntrunc computed stored
======== ============= =============
21 78.9±8μs 62.9±2μs
42 296±9μs 228±2μs
63 664±200μs 586±100μs
84 2.88±0.04ms 2.70±0.04ms
127 6.40±0.5ms 5.36±0.4ms
168 28.1±0.7ms 27.2±0.5ms
252 72.0±2ms 73.1±4ms
======== ============= =============
[100.00%] ··· benchmarks.TimeSuite.time_spectogrd ok
[100.00%] ··· ======== ============ =============
-- method
-------- --------------------------
ntrunc computed stored
======== ============ =============
21 64.8±2μs 47.5±0.9μs
42 214±3μs 135±8μs
63 529±80μs 348±10μs
84 1.21±0.3ms 752±80μs
127 4.73±0.3ms 3.87±0.02ms
168 8.86±0.9ms 8.07±0.2ms
252 27.1±1ms 23.4±1ms
======== ============ =============
Thanks for all your recent contributions @DWesl! Not sure how I feel about touching the ancient NCAR fortran code though - seems like a slippery slope. The only reason I've found to use this instead of more modern libs (like SHTns) is the ability of SPHEREPACK to use a regularly spaced latitude grid (including the poles) without doubling the number of latitudes.
Not sure how I feel about touching the ancient NCAR fortran code though - seems like a slippery slope.
My first thought was to wrap the files not exposed to python in Fortran 90 modules, but that broke the Python binding, so I'm inclined to agree with you.
The only reason I've found to use this instead of more modern libs (like SHTns) is the ability of SPHEREPACK to use a regularly spaced latitude grid (including the poles) without doubling the number of latitudes.
I think there's one other, which uses Clenshaw-Curtis quadrature rather than Gaussian for that integration, but it seems it upscales the grid internally as well.
I think there's a paper some while back showing exact transforms on a regular lat-lon grid can't happen without 2N latitudes, but frequently what we can get with N+1 latitudes is fine. (There's other papers investigating similar things with how well Clenshaw-Curtis quadrature deals with polynomials beyond the degree for which it is exact, compared to Gaussian quadrature, and extending that to certain rational functions). As partial evidence for that, the fast Legendre transform in the IFS is also approximate.
I was poking at the idea of modernizing the Fortran code, and there's a chance it could affect performance. There's a decent chance the compilers see if-then-else the same way as a block of gotos, but if not the more structured code might be easier to optimize.
Benchmarks based on airspeed velocity. Run directions and results below.
$ python -m asv run · Creating environments · Discovering benchmarks.. ·· Uninstalling from virtualenv-py3.9. ·· Building 182d0348 <patch-2> for virtualenv-py3.9.... ·· Installing 182d0348 <patch-2> into virtualenv-py3.9.... · Running 2 total benchmarks (1 commits * 1 environments * 2 benchmarks) [ 0.00%] · For pyspharm commit 182d0348 <patch-2>: [ 0.00%] ·· Benchmarking virtualenv-py3.9 [25.00%] ··· Running (benchmarks.TimeSuite.time_grdtospec--). [50.00%] ··· Running (benchmarks.TimeSuite.time_spectogrd--). [75.00%] ··· benchmarks.TimeSuite.time_grdtospec ok [75.00%] ··· ======== ============ ============ -- method -------- ------------------------- ntrunc computed stored ======== ============ ============ 21 97.2±4μs 72.6±7μs 42 366±30μs 286±30μs 63 939±80μs 713±300μs 84 3.28±0.7ms 2.96±0.5ms 127 8.89±1ms 6.35±2ms 168 33.2±1ms 35.0±8ms 252 90.8±5ms 102±10ms ======== ============ ============ [100.00%] ··· benchmarks.TimeSuite.time_spectogrd ok [100.00%] ··· ======== ============ ============ -- method -------- ------------------------- ntrunc computed stored ======== ============ ============ 21 115±100μs 85.2±20μs 42 249±40μs 184±20μs 63 759±90μs 448±20μs 84 1.47±0.2ms 1.00±0.1ms 127 5.67±0.3ms 4.63±0.2ms 168 10.6±1ms 11.2±2ms 252 30.9±3ms 27.7±10ms ======== ============ ============
@DWesl Hi, I've just rewritten the .f files from pyspharm into modern Fortran and added OpenMP SIMD directives to all the hot loops. My tests show a 10–20 % speed-up. If you're interested, take a look at the spharm submodule in my Skyborn repo:https://github.com/QianyeSu/Skyborn