from-python-to-numpy icon indicating copy to clipboard operation
from-python-to-numpy copied to clipboard

Anatomy of an array introduction. Obvious way is the fastest.

Open ichernob opened this issue 8 years ago • 10 comments

Hello, I've tried this code:

Z = np.ones(4 * 1000000, np.float32)
timeit("Z[...] = 0", globals())
timeit("Z.view(np.float16)[...] = 0", globals())
timeit("Z.view(np.int16)[...] = 0", globals())
timeit("Z.view(np.int32)[...] = 0", globals())
timeit("Z.view(np.float32)[...] = 0", globals())
timeit("Z.view(np.int64)[...] = 0", globals())
timeit("Z.view(np.float64)[...] = 0", globals())
timeit("Z.view(np.complex128)[...] = 0", globals())
timeit("Z.view(np.int8)[...] = 0", globals())

And gave following results: 100 loops, best of 3: 905 usec per loop 100 loops, best of 3: 918 usec per loop 100 loops, best of 3: 925 usec per loop 100 loops, best of 3: 915 usec per loop 100 loops, best of 3: 910 usec per loop 100 loops, best of 3: 912 usec per loop 100 loops, best of 3: 902 usec per loop 100 loops, best of 3: 1.9 msec per loop 100 loops, best of 3: 1.91 msec per loop

And i don't understand the root cause of such opposite results. Could you kindly clarify? Thanks in advance.

P.S. I'm using python 3.5.2 64bit version along with Anaconda. The sysinfo() output: Date: 01/02/17 Python: 3.5.2 Numpy: 1.11.1 Scipy: 0.17.1 Matplotlib: 1.5.1

ichernob avatar Jan 02 '17 06:01 ichernob

Thanks for the report. Your results are surprising. Could you also test using IPython and the magic %timeit (just to be sure I did not mess up the timeit function) ?

Note: I edited your post because the listing was not displayed properly.

rougier avatar Jan 02 '17 07:01 rougier

Thanks for the answering. I will try a little bit later and post here the results

ichernob avatar Jan 02 '17 07:01 ichernob

Well, unfirtunately, right now i'm unable to use numpy via ironpython (never met it before, really can't understand how to get numpy without pip). But i've ran the same code from another computer and get different results: 100 loops, best of 3: 1.21 msec per loop 100 loops, best of 3: 1.21 msec per loop 100 loops, best of 3: 1.26 msec per loop 100 loops, best of 3: 1.22 msec per loop 100 loops, best of 3: 1.21 msec per loop 10 loops, best of 3: 4.3 msec per loop 10 loops, best of 3: 4.22 msec per loop 100 loops, best of 3: 2.21 msec per loop 100 loops, best of 3: 1.01 msec per loop Also, from PTVS results have differend trend: image

ichernob avatar Jan 02 '17 17:01 ichernob

@ruichernob, I think you have confusing IronPython with IPython. IPython is what you want, not IronPython. You can install IPython into your existing Python using pip:

$ pip install ipython

claws avatar Jan 06 '17 06:01 claws

Hi! To start, thank you for great tutorial! I am experiencing the same issue with times as OP. I've measured the following snippets with yours timeit function (I've also tested with %timeit the results are very close):

timeit("Z[...] = 0", globals())
timeit("Z.view(np.float64)[...] = 0", globals())
timeit("Z.view(np.float32)[...] = 0", globals())
timeit("Z.view(np.float16)[...] = 0", globals())
timeit("Z.view(np.complex)[...] = 0", globals())
timeit("Z.view(np.int64)[...] = 0", globals())
timeit("Z.view(np.int32)[...] = 0", globals())
timeit("Z.view(np.int16)[...] = 0", globals())
timeit("Z.view(np.int8)[...] = 0", globals())
timeit("Z.fill(0)", globals())  

I've measured on two computers, with:

Python 3.6.4 numpy 1.14.2

The specs of the first computer: Windows 10 CPU: Intel Xenon E5-1650v4 3.60GHz RAM: 128GB DDR4-2400 Times:

100 loops, best of 3: 750 usec per loop
100 loops, best of 3: 758 usec per loop
100 loops, best of 3: 757 usec per loop
100 loops, best of 3: 760 usec per loop
100 loops, best of 3: 1.06 msec per loop
100 loops, best of 3: 758 usec per loop
100 loops, best of 3: 757 usec per loop
100 loops, best of 3: 760 usec per loop
100 loops, best of 3: 758 usec per loop
100 loops, best of 3: 747 usec per loop

The specs of the second computer: Windows 7 CPU: Intel Pentium P6100 2.00GHz RAM: 4GB DDR3-1333 Times:

100 loops, best of 3: 2.59 msec per loop
10 loops, best of 3: 3.38 msec per loop
10 loops, best of 3: 2.59 msec per loop
100 loops, best of 3: 2.62 msec per loop
100 loops, best of 3: 3.26 msec per loop
100 loops, best of 3: 2.69 msec per loop
100 loops, best of 3: 2.62 msec per loop
100 loops, best of 3: 2.63 msec per loop
10 loops, best of 3: 3.32 msec per loop
100 loops, best of 3: 2.55 msec per loop

As you can see, the results are somewhat consistent with each other, but do not match your observations.

godaygo avatar Mar 21 '18 14:03 godaygo

Given the consistent output from you and @ruichernob it looks that I might be wrong. I don't remember how did I come to this conclusion. I'm pretty sure I got the results written in the book but I might be the only one in the end 😄. Would you mind proposing a PR to fix what's written in the book?

rougier avatar Mar 22 '18 06:03 rougier

It would be great if you had the opportunity to recheck these results on your computer with current version of numpy. After all, everything can be :) And of course the results posted in the book could be fair before.

Since the basic idea of this section is that the obvious method is not optimal, just a change in the timings will make this section meaningless. As for me, the only obvious way to fill the entire array with some value is to use the .fill method of ndarray and obviously this interface was introduced for this purpose.

I've tried to come up with a same simple example where such tricks will allow to overtake another obvious way, but unfortunately not yet found :) In addition, "There should be one-- and preferably only one --obvious way to do it." Having said this, if the fresh results you rechecked will be in agreement, I would just skip this example so as not to be misleading. I apologize that I can not offer an example for replacement.

godaygo avatar Mar 22 '18 08:03 godaygo

On OSX 10.13.3, Pyton 3.6.4, numpy 1.14.2, I got:

>>> Z.view(np.float16)[...] = 0
100 loops, best of 3: 2.85 msec per loop
>>> Z.view(np.int16)[...] = 0
100 loops, best of 3: 2.87 msec per loop
>>> Z.view(np.int32)[...] = 0
100 loops, best of 3: 1.46 msec per loop
>>> Z.view(np.float32)[...] = 0
100 loops, best of 3: 1.58 msec per loop
>>> Z.view(np.int64)[...] = 0
100 loops, best of 3: 1 msec per loop
>>> Z.view(np.float64)[...] = 0
100 loops, best of 3: 1.01 msec per loop
>>> Z.view(np.complex128)[...] = 0
100 loops, best of 3: 918 usec per loop
>>> Z.view(np.int8)[...] = 0
100 loops, best of 3: 614 usec per loop

rougier avatar Mar 23 '18 05:03 rougier

Thank you, interesting results! Could you still timeit with array.fill method. If you do not mind, I would ask a question about this on SO?

godaygo avatar Mar 23 '18 07:03 godaygo

More or less the same:

>>> Z.view(np.float16).fill(0)
100 loops, best of 3: 2.82 msec per loop
>>> Z.view(np.int16).fill(0)
100 loops, best of 3: 2.82 msec per loop
>>> Z.view(np.int32).fill(0)
100 loops, best of 3: 1.48 msec per loop
>>> Z.view(np.float32).fill(0)
100 loops, best of 3: 1.52 msec per loop
>>> Z.view(np.int64).fill(0)
100 loops, best of 3: 1.05 msec per loop
>>> Z.view(np.float64).fill(0)
100 loops, best of 3: 1.04 msec per loop
>>> Z.view(np.complex128).fill(0)
100 loops, best of 3: 930 usec per loop
>>> Z.view(np.int8).fill(0)
100 loops, best of 3: 601 usec per loop

rougier avatar Mar 28 '18 12:03 rougier