pythran icon indicating copy to clipboard operation
pythran copied to clipboard

Slowdown above some array-size threshold

Open jzluo opened this issue 1 year ago • 1 comments

Sorry, wasn't sure how to title this. Continued from #2016.

Thanks so much for all your work. It is much improved and for my case of 20 is no longer slower than Numpy. I played around with it a little more and tested an equivalent (X.T * rootW).T (call it XtW) in addition to the original rootW[:, np.newaxis] * X (call it XW). Please see the plot - I find that XtW has no performance hit with simd, whereas XW actually is still slower than Numpy if larger than dimension size 20 in my example. However at some point (>200 cols in my case) they all become much slower for some reason, which I suppose belongs in a different issue.

#pythran export XW_pythran(float[:,:], float[])
def XW_pythran(X, preds):
    rootW = np.sqrt(preds * (1 - preds))
    XW = rootW[:, np.newaxis] * X
    return XW

#pythran export XtW_pythran(float[:,:], float[])
def XtW_pythran(X, preds):
    rootW = np.sqrt(preds * (1 - preds))
    XW = (X.T * rootW).T
    return XW

# for plot
import perfplot

np.random.seed(0)
preds = np.random.random(20000)
perfplot.show(
    setup=lambda n: np.random.rand(20000, n),
    kernels=[
        lambda X: get_XW(X, preds),  # pure numpy version of XW_pythran
        lambda X: XW_pythran(X, preds),   # -O3 -march=native
        lambda X: XW_pythran_simd(X, preds),  # -O3 -march=native -DUSE_XSIMD
        lambda X: XtW_pythran(X, preds),
        lambda X: XtW_pythran_simd(X, preds)
    ],
    labels=["np", "pythran_XW", "pythran_simd_XW", "pythran_XtW", "pythran_simd_XtW"],
    n_range=[i for i in range(20, 280, 20)],
    xlabel="n_cols",
    relative_to=0,
)

Screenshot from 2022-09-13 00-22-30

Originally posted by @jzluo in https://github.com/serge-sans-paille/pythran/issues/2016#issuecomment-1244883738

jzluo avatar Sep 13 '22 19:09 jzluo