numpy.org Add content on performance (e.g. benchmarks, mention accelerators)

Related to https://github.com/numpy/numpy.org/issues/308#issuecomment-634612765 (connect content to "key features" on front page).

Adding content on benchmarks and accelerators (i.e. Cython, Numba, Pythran, Transonic) was also just suggested in https://mail.python.org/pipermail/numpy-discussion/2020-November/081248.html

Nov 26 '20 22:11 rgommers

Do you have pointers on how those benchmarks should look like? Is there a preferred set of problems to test code in? See, for example https://julialang.org/benchmarks/

Nov 27 '20 19:11 melissawm

The idea of that Julia page is about right I think, short and with only a plot, no code. The main things I'd change from that:

fewer languages (really it's C, Fortran, Julia, R that matter)
keep pure Python and NumPy separate
add accelerators (Cython, Numba, Pythran, Transonic)
change it into a couple of sections, with one plot each (EDIT: I've got some reusable code for generating plots here). There must at least be a difference between vectorize-able problems and ones where NumPy isn't such a good fit

Off the top of my head I'm not sure about an existing widely used set of benchmarks to adopt.

Nov 27 '20 19:11 rgommers

Thanks @rgommers for opening this issue!

Few remarks:

I guess it's better to keep things reasonably simple in this page so that people can have a quick overview of what can be done. I wouldn't consider too many problems.
I think it is better to consider full problems of existing benchmark games (for example http://initialconditions.org/ or https://benchmarksgame-team.pages.debian.net/benchmarksgame/, code here) and not only tiny micro-benchmarks (like in https://julialang.org/benchmarks/) to see the code in quasi-real-life situations (meaning not only few functions defined in a Jupyter notebook). It would be interesting to also mention other aspects than elapsed times, for example readability, size of the files, technical difficulties, time of coding, maintainability, etc. Optimizing is always a balance.
One advantage of Python is that it's possible to go steps by steps from very simple implementations (sometimes not very efficient) to more complex (and more efficient) ones. It would be nice to be able to show that. The N-Body problem is a good example.
I don't think it is necessary to compare Transonic and Pythran. By default Transonic uses Pythran so both tools will have the same performance in the end. Transonic just makes Pythran easier to use for real life coding (except in Jupyter notebooks), with a Python API similar to Numba API and using Python type annotations. Transonic can also use Numba and Cython as backends but it's another story and I don't think it is necessary to go into such details for this page.
The N-Body problem can be a good example
- It's famous
- It has recently been used for an article published in Nature Astronomy against using Python
- One function is not vectorize-able
- We already have implementations in C++, Fortran and Julia,
- We already have a very efficient implementation in Python https://github.com/paugier/nbabel and the result is spectacular (see the figure)!
It's also interesting to give at least one example using OpenMP.
This article https://onlinelibrary.wiley.com/iucr/doi/10.1107/S1600576719008471 is very interesting and serious. It should be cited.
It would be good to send two important messages in terms of performance: (i) no premature optimization and (2) measure, don't guess. We can at least mention CProfile.
It would be good to also honestly present some limitations of this strategy of acceleration of Python codes.

Nov 27 '20 21:11 paugier

It would be interesting to also mention other aspects than elapsed times, for example readability, size of the files, technical difficulties, time of coding, maintainability, etc. Optimizing is always a balance.

That's a good point, yes.

It's also interesting to give at least one example using OpenMP.

I don't think I'd want to get into that, on the same page at least. Because then we'd also have to touch on other forms of parallelism (e.g. Dask, multiprocessing, asyncio).

This article https://onlinelibrary.wiley.com/iucr/doi/10.1107/S1600576719008471 is very interesting and serious. It should be cited.

Thanks, I wasn't aware of this article. It's really well-written.

It would be good to send two important messages in terms of performance: (i) no premature optimization and (2) measure, don't guess. We can at least mention CProfile.

I think the page really should focus on performance, rather than turning into a tutorial. So this can be one line to one paragraph, but it should link elsewhere for things like profiling.

Nov 28 '20 11:11 rgommers

Adding links to the recent Nature correspondence by @paugier et al.:

Article: https://rdcu.be/ciO0J
Benchmarks: https://github.com/paugier/nbabel
Twitter thread: https://twitter.com/pierre_augier/status/1385325261189787650

May 25 '21 18:05 rgommers