from-python-to-numpy
from-python-to-numpy copied to clipboard
Anatomy of an array introduction. Obvious way is the fastest.
Hello, I've tried this code:
Z = np.ones(4 * 1000000, np.float32)
timeit("Z[...] = 0", globals())
timeit("Z.view(np.float16)[...] = 0", globals())
timeit("Z.view(np.int16)[...] = 0", globals())
timeit("Z.view(np.int32)[...] = 0", globals())
timeit("Z.view(np.float32)[...] = 0", globals())
timeit("Z.view(np.int64)[...] = 0", globals())
timeit("Z.view(np.float64)[...] = 0", globals())
timeit("Z.view(np.complex128)[...] = 0", globals())
timeit("Z.view(np.int8)[...] = 0", globals())
And gave following results: 100 loops, best of 3: 905 usec per loop 100 loops, best of 3: 918 usec per loop 100 loops, best of 3: 925 usec per loop 100 loops, best of 3: 915 usec per loop 100 loops, best of 3: 910 usec per loop 100 loops, best of 3: 912 usec per loop 100 loops, best of 3: 902 usec per loop 100 loops, best of 3: 1.9 msec per loop 100 loops, best of 3: 1.91 msec per loop
And i don't understand the root cause of such opposite results. Could you kindly clarify? Thanks in advance.
P.S. I'm using python 3.5.2 64bit version along with Anaconda. The sysinfo() output: Date: 01/02/17 Python: 3.5.2 Numpy: 1.11.1 Scipy: 0.17.1 Matplotlib: 1.5.1
Thanks for the report. Your results are surprising. Could you also test using IPython and the magic %timeit
(just to be sure I did not mess up the timeit function) ?
Note: I edited your post because the listing was not displayed properly.
Thanks for the answering. I will try a little bit later and post here the results
Well, unfirtunately, right now i'm unable to use numpy via ironpython (never met it before, really can't understand how to get numpy without pip). But i've ran the same code from another computer and get different results:
100 loops, best of 3: 1.21 msec per loop
100 loops, best of 3: 1.21 msec per loop
100 loops, best of 3: 1.26 msec per loop
100 loops, best of 3: 1.22 msec per loop
100 loops, best of 3: 1.21 msec per loop
10 loops, best of 3: 4.3 msec per loop
10 loops, best of 3: 4.22 msec per loop
100 loops, best of 3: 2.21 msec per loop
100 loops, best of 3: 1.01 msec per loop
Also, from PTVS results have differend trend:
@ruichernob, I think you have confusing IronPython with IPython. IPython is what you want, not IronPython. You can install IPython into your existing Python using pip:
$ pip install ipython
Hi! To start, thank you for great tutorial!
I am experiencing the same issue with times as OP. I've measured the following snippets with yours timeit
function (I've also tested with %timeit
the results are very close):
timeit("Z[...] = 0", globals())
timeit("Z.view(np.float64)[...] = 0", globals())
timeit("Z.view(np.float32)[...] = 0", globals())
timeit("Z.view(np.float16)[...] = 0", globals())
timeit("Z.view(np.complex)[...] = 0", globals())
timeit("Z.view(np.int64)[...] = 0", globals())
timeit("Z.view(np.int32)[...] = 0", globals())
timeit("Z.view(np.int16)[...] = 0", globals())
timeit("Z.view(np.int8)[...] = 0", globals())
timeit("Z.fill(0)", globals())
I've measured on two computers, with:
Python 3.6.4 numpy 1.14.2
The specs of the first computer: Windows 10 CPU: Intel Xenon E5-1650v4 3.60GHz RAM: 128GB DDR4-2400 Times:
100 loops, best of 3: 750 usec per loop
100 loops, best of 3: 758 usec per loop
100 loops, best of 3: 757 usec per loop
100 loops, best of 3: 760 usec per loop
100 loops, best of 3: 1.06 msec per loop
100 loops, best of 3: 758 usec per loop
100 loops, best of 3: 757 usec per loop
100 loops, best of 3: 760 usec per loop
100 loops, best of 3: 758 usec per loop
100 loops, best of 3: 747 usec per loop
The specs of the second computer: Windows 7 CPU: Intel Pentium P6100 2.00GHz RAM: 4GB DDR3-1333 Times:
100 loops, best of 3: 2.59 msec per loop
10 loops, best of 3: 3.38 msec per loop
10 loops, best of 3: 2.59 msec per loop
100 loops, best of 3: 2.62 msec per loop
100 loops, best of 3: 3.26 msec per loop
100 loops, best of 3: 2.69 msec per loop
100 loops, best of 3: 2.62 msec per loop
100 loops, best of 3: 2.63 msec per loop
10 loops, best of 3: 3.32 msec per loop
100 loops, best of 3: 2.55 msec per loop
As you can see, the results are somewhat consistent with each other, but do not match your observations.
Given the consistent output from you and @ruichernob it looks that I might be wrong. I don't remember how did I come to this conclusion. I'm pretty sure I got the results written in the book but I might be the only one in the end 😄. Would you mind proposing a PR to fix what's written in the book?
It would be great if you had the opportunity to recheck these results on your computer with current version of numpy
. After all, everything can be :) And of course the results posted in the book could be fair before.
Since the basic idea of this section is that the obvious method is not optimal, just a change in the timings will make this section meaningless. As for me, the only obvious way to fill the entire array with some value is to use the .fill
method of ndarray
and obviously this interface was introduced for this purpose.
I've tried to come up with a same simple example where such tricks will allow to overtake another obvious way, but unfortunately not yet found :) In addition, "There should be one-- and preferably only one --obvious way to do it." Having said this, if the fresh results you rechecked will be in agreement, I would just skip this example so as not to be misleading. I apologize that I can not offer an example for replacement.
On OSX 10.13.3, Pyton 3.6.4, numpy 1.14.2, I got:
>>> Z.view(np.float16)[...] = 0
100 loops, best of 3: 2.85 msec per loop
>>> Z.view(np.int16)[...] = 0
100 loops, best of 3: 2.87 msec per loop
>>> Z.view(np.int32)[...] = 0
100 loops, best of 3: 1.46 msec per loop
>>> Z.view(np.float32)[...] = 0
100 loops, best of 3: 1.58 msec per loop
>>> Z.view(np.int64)[...] = 0
100 loops, best of 3: 1 msec per loop
>>> Z.view(np.float64)[...] = 0
100 loops, best of 3: 1.01 msec per loop
>>> Z.view(np.complex128)[...] = 0
100 loops, best of 3: 918 usec per loop
>>> Z.view(np.int8)[...] = 0
100 loops, best of 3: 614 usec per loop
Thank you, interesting results! Could you still timeit
with array.fill
method. If you do not mind, I would ask a question about this on SO?
More or less the same:
>>> Z.view(np.float16).fill(0)
100 loops, best of 3: 2.82 msec per loop
>>> Z.view(np.int16).fill(0)
100 loops, best of 3: 2.82 msec per loop
>>> Z.view(np.int32).fill(0)
100 loops, best of 3: 1.48 msec per loop
>>> Z.view(np.float32).fill(0)
100 loops, best of 3: 1.52 msec per loop
>>> Z.view(np.int64).fill(0)
100 loops, best of 3: 1.05 msec per loop
>>> Z.view(np.float64).fill(0)
100 loops, best of 3: 1.04 msec per loop
>>> Z.view(np.complex128).fill(0)
100 loops, best of 3: 930 usec per loop
>>> Z.view(np.int8).fill(0)
100 loops, best of 3: 601 usec per loop