Yeppp.jl icon indicating copy to clipboard operation
Yeppp.jl copied to clipboard

Remove dot() function

Open hiccup7 opened this issue 9 years ago • 11 comments

Tests performed on 64-bit Windows Intel Haswell CPU with Julia v0.3.7 using OpenBLAS v0.2.14 extracted from Julia v0.4 nightly:

blas_set_num_threads(CPU_CORES)
const v=ones(Float64,100000)
@time for k=1:1000000;s=dot(v,v);end
#23.7 seconds, single-threaded

using Yeppp
const v=ones(Float64,100000)
@time for k=1:1000000;s=Yeppp.dot(v,v);end
#22.4 seconds, single-threaded

Test performed on the same PC with MKL BLAS on Python:

import numpy as np
from scipy.linalg.blas import ddot
from timeit import default_timer as timer
v = np.ones(100000)
start = timer()
for k in range(1000000):
    s = ddot(v,v)
exec_time=(timer() - start)
print
print("Execution took", str(round(exec_time, 3)), "seconds")
#7.5 seconds, multi-threaded

Thus, Yeppp is 5% faster than OpenBLAS v0.2.14 and 300% slower than MKL BLAS. I would rather not keep Yeppp's dot() function as a negligible work-around for an OpenBLAS v0.2.14 problem.

I propose that we remove the dot() function. To get Julia to match Python's speed, I propose instead: a) Ship Julia with MKL, see https://github.com/JuliaLang/julia/issues/10969 b) Work with the OpenBLAS project to correct the dot product performance problems. The develop branch of OpenBLAS may already work. See the commit from April 24th entitled "bugfixes: replaced int with BLASLONG"

I am concerned that users of Yeppp.dot() will miss the 300% speed improvement coming in the base. Better to remove it before Yeppp becomes popular in Julia.

hiccup7 avatar Apr 27 '15 21:04 hiccup7

That is very strange. In my tests Yeppp! dot product outperformed MKL. I'll look into into, probably a Windows-specific problem.

Maratyszcza avatar Apr 27 '15 21:04 Maratyszcza

I don't see anything suspicious in the assembly code. Probably, the result is due to passing the same vector v for both arguments. Maybe MKL has an optimization for this specific case (Yeppp! has this optimization too, but as a separate SumSquares function).

Maratyszcza avatar Apr 27 '15 21:04 Maratyszcza

Hold on, isn't the MKL number multi-threaded?

Keno avatar Apr 27 '15 21:04 Keno

@Keno probably. There are separate .so libraries for multithreaded and non-multithreaded versions; but I am not sure which is used by Julia.

Maratyszcza avatar Apr 27 '15 21:04 Maratyszcza

@Maratyszcza , I don't understand your last comment. Since I'm using Windows, I'm using yeppp.dll, not a .so library. Since Julia doesn't presently use BinDeps, each user must copy the appropriate library from the yeppp-1.0.0.zip file linked in README.md. Since I am using 64-bit Windows OS, I chose yeppp.dll in the binaries\windows\amd64 folder of yeppp-1.0.0.zip. I see no options for multithreaded vs. non-multithreaded versions.

hiccup7 avatar Apr 28 '15 17:04 hiccup7

@hiccup7 This comment is about MKL, Yeppp! is always single-threaded.

Maratyszcza avatar Apr 28 '15 17:04 Maratyszcza

@Maratyszcza , thanks for the clarification. @Keno , My CPU meter showed that MKL is multi-threaded with Python. I included this aspect in the comment for each test in the first post. The OpenBLAS develop branch is already multi-threaded for dot products of reals, or at least it will be when it meets or exceeds MKL's performance.

hiccup7 avatar Apr 28 '15 18:04 hiccup7

I think we can leave dot in there. It is in the Yeppp namespace anyways, and it is handy to have when not using MKL or some other BLAS library.

ViralBShah avatar Apr 29 '15 04:04 ViralBShah

When Julia is built without any BLAS library, which library is used for dot products? If Yeppp provides a faster dot product than the existing non-BLAS method, then the base could use the Yeppp function under the hood.

My intention is that the base provides the fastest library for dot(), regardless of which or any BLAS library is used. This will make Julia code more portable. If Yeppp's dot() is desired for non-BLAS builds, then we could at least avoid exporting the dot() method from the Yeppp module so that novice users will use the faster base dot() method instead.

hiccup7 avatar Apr 29 '15 16:04 hiccup7

Currently, there is no way to build Julia without a BLAS library.

ViralBShah avatar Apr 29 '15 17:04 ViralBShah

for dot(x, y) function, Is possible to test whether arrays x and y share the same pointer and then choose which function in Yeppp! to be used?

dot(x, x) and x = y; dot(x, y) should have the same performance in Julia since x and y are sharing the same data.

GaZ3ll3 avatar Jul 06 '15 21:07 GaZ3ll3