Yeppp.jl
Yeppp.jl copied to clipboard
Remove dot() function
Tests performed on 64-bit Windows Intel Haswell CPU with Julia v0.3.7 using OpenBLAS v0.2.14 extracted from Julia v0.4 nightly:
blas_set_num_threads(CPU_CORES)
const v=ones(Float64,100000)
@time for k=1:1000000;s=dot(v,v);end
#23.7 seconds, single-threaded
using Yeppp
const v=ones(Float64,100000)
@time for k=1:1000000;s=Yeppp.dot(v,v);end
#22.4 seconds, single-threaded
Test performed on the same PC with MKL BLAS on Python:
import numpy as np
from scipy.linalg.blas import ddot
from timeit import default_timer as timer
v = np.ones(100000)
start = timer()
for k in range(1000000):
s = ddot(v,v)
exec_time=(timer() - start)
print
print("Execution took", str(round(exec_time, 3)), "seconds")
#7.5 seconds, multi-threaded
Thus, Yeppp is 5% faster than OpenBLAS v0.2.14 and 300% slower than MKL BLAS. I would rather not keep Yeppp's dot() function as a negligible work-around for an OpenBLAS v0.2.14 problem.
I propose that we remove the dot() function. To get Julia to match Python's speed, I propose instead: a) Ship Julia with MKL, see https://github.com/JuliaLang/julia/issues/10969 b) Work with the OpenBLAS project to correct the dot product performance problems. The develop branch of OpenBLAS may already work. See the commit from April 24th entitled "bugfixes: replaced int with BLASLONG"
I am concerned that users of Yeppp.dot() will miss the 300% speed improvement coming in the base. Better to remove it before Yeppp becomes popular in Julia.
That is very strange. In my tests Yeppp! dot product outperformed MKL. I'll look into into, probably a Windows-specific problem.
I don't see anything suspicious in the assembly code. Probably, the result is due to passing the same vector v
for both arguments. Maybe MKL has an optimization for this specific case (Yeppp! has this optimization too, but as a separate SumSquares
function).
Hold on, isn't the MKL number multi-threaded?
@Keno probably. There are separate .so
libraries for multithreaded and non-multithreaded versions; but I am not sure which is used by Julia.
@Maratyszcza , I don't understand your last comment. Since I'm using Windows, I'm using yeppp.dll
, not a .so
library. Since Julia doesn't presently use BinDeps, each user must copy the appropriate library from the yeppp-1.0.0.zip
file linked in README.md
. Since I am using 64-bit Windows OS, I chose yeppp.dll
in the binaries\windows\amd64
folder of yeppp-1.0.0.zip
. I see no options for multithreaded vs. non-multithreaded versions.
@hiccup7 This comment is about MKL, Yeppp! is always single-threaded.
@Maratyszcza , thanks for the clarification.
@Keno , My CPU meter showed that MKL is multi-threaded with Python. I included this aspect in the comment for each test in the first post. The OpenBLAS develop
branch is already multi-threaded for dot products of reals, or at least it will be when it meets or exceeds MKL's performance.
I think we can leave dot
in there. It is in the Yeppp namespace anyways, and it is handy to have when not using MKL or some other BLAS library.
When Julia is built without any BLAS library, which library is used for dot products? If Yeppp provides a faster dot product than the existing non-BLAS method, then the base could use the Yeppp function under the hood.
My intention is that the base provides the fastest library for dot(), regardless of which or any BLAS library is used. This will make Julia code more portable. If Yeppp's dot() is desired for non-BLAS builds, then we could at least avoid exporting the dot() method from the Yeppp module so that novice users will use the faster base dot() method instead.
Currently, there is no way to build Julia without a BLAS library.
for dot(x, y)
function, Is possible to test whether arrays x
and y
share the same pointer and then choose which function in Yeppp! to be used?
dot(x, x)
and x = y; dot(x, y)
should have the same performance in Julia since x
and y
are sharing the same data.