hope icon indicating copy to clipboard operation
hope copied to clipboard

Fail to reproduce timings

Open serge-sans-paille opened this issue 10 years ago • 6 comments

I installed hope from the git and run the following:

import numpy as np
import hope
@hope.jit
def pdf(density, dims, center, w2D, r50, b, a):
    for x in range(dims[0]):
        for y in range(dims[1]):
            dr = np.sqrt((x - center[0]) ** 2 + (y - center[1]) ** 2)
            density[x, y] = np.sum(w2D * 2 * (b - 1) / (2 * np.pi * (r50 * a)**2) * (1 + (dr / (r50 * a))**2)**(-b))
    return density

with:

 python -m timeit -s 'import numpy as np; b = 3.5; a = 1. / np.sqrt(2. ** (1. / (b - 1.)) - 1.) ; r50=20;center = np.array([10.141, 10.414]);dims = np.array([20, 20]) ; x1D = np.array([ 0.5 - 0.9491079123427585245262 / 2 , 0.5 - 0.7415311855993944398639 / 2 , 0.5 - 0.4058451513773971669066 / 2 , 0.5 , 0.5 + 0.4058451513773971669066 / 2 , 0.5 + 0.7415311855993944398639 / 2 , 0.5 + 0.9491079123427585245262 / 2 ], dtype=np.float32) ; w1D = np.array([ 0.1294849661688696932706 / 2 , 0.2797053914892766679015 / 2 , 0.38183005050511894495 / 2 , 0.4179591836734693877551 / 2 , 0.38183005050511894495 / 2 , 0.2797053914892766679015 / 2 , 0.1294849661688696932706 / 2 ], dtype=np.float32) ; w2D = np.outer(w1D, w1D) ; from pdf import pdf; density = np.zeros(dims, dtype=np.float32)' 'pdf(density, dims, center, w2D, r50, b, a)'

and the output is rather slow compared to the expected result. C++ module runs at the expected speed, so what did I do wrong?

serge-sans-paille avatar Feb 23 '15 08:02 serge-sans-paille

@serge-sans-paille i've copy-pasted your code an got: 10000 loops, best of 3: 103 usec per loop

when I compile the c++ code that we provide in the benchmarks and then measure the timing (using pdf = __import__("pdf", globals(), locals(), [], -1).run) I get: 10000 loops, best of 3: 55.1 usec per loop

This factor of 2 is expected.

What is your OS and compiler?

cosmo-ethz avatar Feb 23 '15 09:02 cosmo-ethz

OS: linux/debian/testing compiler: c++ --version g++-4.9.real (Debian 4.9.1-19) 4.9.1

serge-sans-paille avatar Feb 23 '15 09:02 serge-sans-paille

Admittedly, I have little experience with this combination (HOPE on debian & g++4.9).

What are the timings you get for the C++ and the jitted PDF code?

What are the compile flags you’ve used to compile the C++ code and what is HOPE using (add import hope; hope.config.verbose = True; in the call)

cosmo-ethz avatar Feb 23 '15 09:02 cosmo-ethz

pdf(float32^2 density, int64^1 dims, float64^1 center, float32^2 w2D, int64 r50, float64 b, float64 a)
    for x.l in (0.J:dims.l[0.J]) {
        for y.l in (0.J:dims.l[1.J]) {
            new dr.d
            dr.d = numpy.sqrt((((x.l - center.d[0.J]) ** 2.J) + ((y.l - center.d[1.J]) ** 2.J)))
            new __sum0.d
            __sum0.d = numpy.sum(((((w2D.f[:w2D@0,:w2D@1] * 2.J) * (b.D - 1.J)) / ((2.J * 3.141592653589793.D) * ((r50.J * a.d) ** 2.J))) * ((1.J + ((dr.d / (r50.J * a.d)) ** 2.J)) ** -b.D)))
            density.f[x.l, y.l] = __sum0.d
        }
    }
    return density.f[:density@0,:density@1]

Compiling following functions:
pdf(float32^2 density, int64^1 dims, float64^1 center, float32^2 w2D, int64 r50, float64 b, float64 a)
running build_ext
building 'pdf_e5a322486ef195fd39b6f44cfb7c3a45dd00790dd0bc04544bbea1ae_0' extension
C compiler: x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -fno-strict-aliasing -g -O2 -fPIC

compile options: '-I/usr/lib/python2.7/dist-packages/numpy/core/include -I/usr/include/python2.7 -c'
extra options: '-Wall -Wno-unused-variable -std=c++11'
x86_64-linux-gnu-gcc: /tmp/hope2H9vzI/pdf_e5a322486ef195fd39b6f44cfb7c3a45dd00790dd0bc04544bbea1ae_0.cpp
c++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-z,relro -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -D_FORTIFY_SOURCE=2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wl,-z,relro -g -O2 /tmp/hope2H9vzI/pdf_e5a322486ef195fd39b6f44cfb7c3a45dd00790dd0bc04544bbea1ae_0.o -o /tmp/hope2H9vzI/pdf_e5a322486ef195fd39b6f44cfb7c3a45dd00790dd0bc04544bbea1ae_0.so

10 loops, best of 3: 1.41 msec per loop

serge-sans-paille avatar Feb 23 '15 09:02 serge-sans-paille

and 1.32ms when compiling with clang

serge-sans-paille avatar Feb 23 '15 09:02 serge-sans-paille

@serge-sans-paille I was able to reproduce the behavior you see on an Ubuntu box. It seems like that the other benchmarks are doing alright and only the star-psf benchmark is causing some issues.

As expected, the code that HOPE generates is identical on OSX and Ubuntu. This makes me assume that the compilers on Linux might struggle to optimize the code as much as clang on OSX. This isn’t very satisfying but I don’t have better explanation at the moment.

cosmo-ethz avatar Feb 25 '15 11:02 cosmo-ethz